为什么在此熊猫数据框分配中丢失9个值？

Levon 发表于 Dev

列文

我正在尝试使用新的标准化值（ndf2）更新原始数据框（df）中的一些数字列。有333行非空值。赋值后，我的9个数值为NaN-我怀疑我的赋值操作有问题或索引出现问题？如何正确执行此操作？

ndf2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 333 entries, 0 to 332
Data columns (total 4 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   bill_length_mm     333 non-null    float64
 1   bill_depth_mm      333 non-null    float64
 2   flipper_length_mm  333 non-null    float64
 3   body_mass_g        333 non-null    float64
dtypes: float64(4)
memory usage: 10.5 KB

和

df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 333 entries, 0 to 343
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   species            333 non-null    object 
 1   island             333 non-null    object 
 2   bill_length_mm     333 non-null    float64
 3   bill_depth_mm      333 non-null    float64
 4   flipper_length_mm  333 non-null    float64
 5   body_mass_g        333 non-null    float64
 6   sex                333 non-null    object 
dtypes: float64(4), object(3)
memory usage: 20.8+ KB

这就是问题所在-但不确定如何正确执行此操作

df.iloc[:,2:-1] = ndf2  # is this the best way to do this?

因为在此之后：

df.info(), df.shape

<class 'pandas.core.frame.DataFrame'>
Int64Index: 333 entries, 0 to 343
Data columns (total 7 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   species            333 non-null    object 
 1   island             333 non-null    object 
 2   bill_length_mm     324 non-null    float64
 3   bill_depth_mm      324 non-null    float64
 4   flipper_length_mm  324 non-null    float64
 5   body_mass_g        324 non-null    float64
 6   sex                333 non-null    object 
dtypes: float64(4), object(3)
memory usage: 20.8+ KB
(None, (333, 7))

我剩下324个数字非空值。对于数据框Int64Index: 333 entries, 0 to 343和所报告的范围不同，我也感到困惑Int64Index: 333 entries, 0 to 343。

数据集最初以344个条目开始，但之后

df.dropna(inplace=True)
df.reset_index(drop=True)

正如我所料，它下降到了333。

更新：看来，如果我这样做df.reset_index(drop=True, inplace=True)，可以解决此问题。

真空库

可能是您的索引未对齐。您可以使用以下方法进行检查：

df1.index.equals(ndf2.index)

如果不是，则可以通过以下方式重置索引：

df.reset_index(inplace = True)
ndf2.reset_index(inplace = True)

然后，分配值：

df[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']] = \
ndf2[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']]

或者，如果您的数据集具有相同的行数，则以下操作无需索引对齐即可工作：

df[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']] = \
ndf2[['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g']].to_numpy()

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。