熊猫用不同的列连接数据帧:AttributeError:'NoneType'对象没有属性'is_extension'


问题内容

我试图串联两个数据框,它们沿0轴具有不同的列名。我在这里发现了一个类似的问题,如何在使用熊猫DataFrame的纵轴级联中使用join_axes?但是,由于我的两个数据框的列名不同,因此该解决方案不适用于我。由于我的原始数据太大而无法在此处发布,因此以下示例应说明我正在尝试执行的操作:

df1 = pd.DataFrame(np.random.randint(0,100,size=(1, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0,100,size=(1, 4)), columns=list('EFGH'))

#df1
    A   B   C   D
0   26  39  7   44

#df2
    E   F   G   H
0   12  44  26  64

pd.concat([df1,df2],axis=0).reset_index(drop=True)
# desired output looks like this
A   B   C   D   E   F   G   H
0   26.0    39.0    7.0 44.0    NaN NaN NaN NaN
1   NaN NaN NaN NaN 12.0    44.0    26.0    64.0

上面的代码可以完美地工作。但是,一旦使用上面完全相同的语法为df1和df2输入了自己的数据帧,就会收到错误消息。

# my real dfs are called data1 & data2, I tried setting ignore_index=True and ignore_index=False
pd.concat([data1, data2],axis=0, ignore_index=True)

导致以下错误:

错误:

 ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    <ipython-input-194-dbee1fd0bdea> in <module>
    ----> 1 pd.concat([data1, data2],axis=0, ignore_index=True)

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
        224                        verify_integrity=verify_integrity,
        225                        copy=copy, sort=sort)
    --> 226     return op.get_result()
        227 
        228

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\pandas\core\reshape\concat.py in get_result(self)
        421             new_data = concatenate_block_managers(
        422                 mgrs_indexers, self.new_axes, concat_axis=self.axis,
    --> 423                 copy=self.copy)
        424             if not self.copy:
        425                 new_data._consolidate_inplace()

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\pandas\core\internals.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
       5414                 values = values.view()
       5415             b = b.make_block_same_class(values, placement=placement)
    -> 5416         elif is_uniform_join_units(join_units):
       5417             b = join_units[0].block.concat_same_type(
       5418                 [ju.block for ju in join_units], placement=placement)

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\pandas\core\internals.py in is_uniform_join_units(join_units)
       5438         # no blocks that would get missing values (can lead to type upcasts)
       5439         # unless we're an extension dtype.
    -> 5440         all(not ju.is_na or ju.block.is_extension for ju in join_units) and
       5441         # no blocks with indexers (as then the dimensions do not fit)
       5442         all(not ju.indexers for ju in join_units) and

    ~\AppData\Local\Continuum\anaconda3\envs\tensorflow-gpu\lib\site-packages\pandas\core\internals.py in <genexpr>(.0)
       5438         # no blocks that would get missing values (can lead to type upcasts)
       5439         # unless we're an extension dtype.
    -> 5440         all(not ju.is_na or ju.block.is_extension for ju in join_units) and
       5441         # no blocks with indexers (as then the dimensions do not fit)
       5442         all(not ju.indexers for ju in join_units) and

    AttributeError: 'NoneType' object has no attribute 'is_extension'

我不太明白此错误消息试图告诉我什么。我一直试图在两个数据帧上都使用fillna,这样就不再有’NoneType’了:

data2 = data2.fillna(999)
data1 = data1.fillna(999)

但是,我仍然收到相同的错误消息。

我正在使用的两个数据帧非常大,因此很遗憾,我无法在此处发布整个示例。我的两个数据帧的内容只是整数,浮点数和字符串,因此在这里没有什么花哨的事情会引起错误。关于什么可能导致此错误或我可以检查一下以缩小问题范围的任何想法?

非常感谢你!


问题答案:

原来问题出在我的一个数据框中只是 重复的列名…摆脱那些重复解决了问题。上面的代码现在可以完美运行了。