当“人”,“年”和“项目”相同时,下面的数据框和我要删除连续的重复行。
如果原始数据帧如下所示,则连续时具有相同“ People”,“ Year”,“ Project”的行将被删除。
data = {'People' : ["David","David","David","David","John","John","John"],
'Year': ["2016","2016","2017","2016","2016","2017","2017",],
'Project' : ["TN","TN","TN","TN","DJ","DM","DM"],
'Earning' : [878,682,767,620,964,610,772]}
我尝试了这个,但是不起作用:
df_1 = df.loc[(df['People', 'Year', 'Project'].shift() != df['People', 'Year', 'Project'])]
尝试-此行删除了非连续的“ David,2016,TN,620”
df_1 = df.drop_duplicates(subset=['People','Year','Project'])
更改为此后,它将保留所有行:
df_1 = df.drop_duplicates(subset=['People','Year','Project', 'Earning'])
什么是正确的方法?谢谢!
你可以比较DataFrame.shift
主编值不相等,然后测试至少一个True
每行DataFrame.any
有boolean indexing
:
cols = ['People','Year','Project']
df_1 = df[df[cols].ne(df[cols].shift()).any(axis=1)]
print (df_1)
People Year Project Earning
0 David 2016 TN 878
2 David 2017 TN 767
3 David 2016 TN 620
4 John 2016 DJ 964
5 John 2017 DM 610
详细资料:
print (df[cols].ne(df[cols].shift()))
People Year Project
0 True True True
1 False False False
2 False True False
3 False True False
4 True False True
5 False True True
6 False False False
print (df[cols].ne(df[cols].shift()).any(axis=1))
0 True
1 False
2 True
3 True
4 True
5 True
6 False
dtype: bool
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句