数据框删除连续重复的行，各列内容相同

debugcn 发表于 Dev

马克·K

当“人”，“年”和“项目”相同时，下面的数据框和我要删除连续的重复行。

如果原始数据帧如下所示，则连续时具有相同“ People”，“ Year”，“ Project”的行将被删除。

data = {'People' : ["David","David","David","David","John","John","John"],
'Year': ["2016","2016","2017","2016","2016","2017","2017",],
'Project' : ["TN","TN","TN","TN","DJ","DM","DM"],
'Earning' : [878,682,767,620,964,610,772]}

我尝试了这个，但是不起作用：

df_1 = df.loc[(df['People', 'Year', 'Project'].shift() != df['People', 'Year', 'Project'])]

尝试-此行删除了非连续的“ David，2016，TN，620”

df_1 = df.drop_duplicates(subset=['People','Year','Project'])

更改为此后，它将保留所有行：

df_1 = df.drop_duplicates(subset=['People','Year','Project', 'Earning'])

什么是正确的方法？谢谢！

耶斯列尔

你可以比较DataFrame.shift主编值不相等，然后测试至少一个True每行DataFrame.any有boolean indexing：

cols = ['People','Year','Project']
df_1 = df[df[cols].ne(df[cols].shift()).any(axis=1)]
print (df_1)
  People  Year Project  Earning
0  David  2016      TN      878
2  David  2017      TN      767
3  David  2016      TN      620
4   John  2016      DJ      964
5   John  2017      DM      610

详细资料：

print (df[cols].ne(df[cols].shift()))
   People   Year  Project
0    True   True     True
1   False  False    False
2   False   True    False
3   False   True    False
4    True  False     True
5   False   True     True
6   False  False    False

print (df[cols].ne(df[cols].shift()).any(axis=1))
0     True
1    False
2     True
3     True
4     True
5     True
6    False
dtype: bool

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。