大数据框的Pandas IndexError

debugcn 发表于 Dev

达·L。

当我尝试向大型DataFrame添加新列时，出现IndexError。有人能帮我解决这个错误吗？

>vec
                 0        1        2        3        4        5        6 
V1.UC8.0         0        0        0        0        0        0        0   
V1.UC48.0        0        0        0        0        0        0        0   

                 7        8        9         ...     2546531  2546532  2546533  
V1.UC8.0         0        0        0   ...           0        0        0   
V1.UC48.0        0        0        0   ...           0        0        0   

               2546534  2546535  2546536  2546537  2546538  2546539  2546540  
V1.UC8.0         0        0        0        0        0        0        0  
V1.UC48.0        0        0        0        0        0        0        0  

[2 rows x 2546541 columns]

> vec['ToDrop']=0


    IndexError                                Traceback (most recent call last)
<ipython-input-40-9868611037ed> in <module>()
----> 1 vec['ToDrop']=0

C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in __setitem__(self, key, value)
   2115         else:
   2116             # set column
-> 2117             self._set_item(key, value)
   2118 
   2119     def _setitem_slice(self, key, value):

C:\Anaconda\lib\site-packages\pandas\core\frame.pyc in _set_item(self, key, value)
   2193         self._ensure_valid_index(value)
   2194         value = self._sanitize_column(key, value)
-> 2195         NDFrame._set_item(self, key, value)
   2196 
   2197         # check if we are modifying a copy

C:\Anaconda\lib\site-packages\pandas\core\generic.pyc in _set_item(self, key, value)
   1188 
   1189     def _set_item(self, key, value):
-> 1190         self._data.set(key, value)
   1191         self._clear_item_cache()
   1192 

C:\Anaconda\lib\site-packages\pandas\core\internals.pyc in set(self, item, value, check)
   2970 
   2971         try:
-> 2972             loc = self.items.get_loc(item)
   2973         except KeyError:
   2974             # This item wasn't present, just insert at end

C:\Anaconda\lib\site-packages\pandas\core\index.pyc in get_loc(self, key, method)
   1435         """
   1436         if method is None:
-> 1437             return self._engine.get_loc(_values_from_object(key))
   1438 
   1439         indexer = self.get_indexer([key], method=method)

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3824)()

pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:3578)()

pandas\src\util.pxd in util.get_value_at (pandas\index.c:15287)()

IndexError: index out of bounds

我一直在尝试向转置的DataFrame（vec.T）添加新行，但是出现了相同的错误。

迪尔

确实这很奇怪。

您可以使用以下方法作为解决方法：

vec = pd.merge(vec, pd.DataFrame([0, 0], columns=["new"]), right_index=True, left_index=True)  # Optional: pass copy=False

确保新的1列数据框具有与相同的索引vec。

有关为什么这很奇怪的更多信息：

希望有人可以提供适当的答案。

df = pd.DataFrame(np.zeros((2, 2546540)))
df[2546540] = 0

输出：IndexError如在OP的帖子中所述。

df["blah"] = 0

输出：

TypeError: unorderable types: numpy.ndarray() < str()

同时，使用小型数据框就可以了：

df = pd.DataFrame(np.zeros((2, 200)))
df[200] = 0

输出完全符合预期：

   0    1    2    3    4    5    6    7    8    9   ...   191  192  193  194  0    0    0    0    0    0    0    0    0    0    0 ...     0    0    0    0   
1    0    0    0    0    0    0    0    0    0    0 ...     0    0    0    0   

   195  196  197  198  199  200  
0    0    0    0    0    0    0  
1    0    0    0    0    0    0  

[2 rows x 201 columns]

希望这会有所帮助，并且有人可以解释这种熊猫行为。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。