我的数据在以下数据框中
df = pd.DataFrame({'AccID':['001','001','001','002','002','003'],
'AccTypes':['A','B','C','A','B','C'],
'Status':['Closed','Active','Active','Active','Closed','Active'],
'Years':[5,15,10,20,25,30]})
AccID AccTypes Status Years
001 A Closed 5
001 B Active 15
001 C Active 10
002 A Active 20
002 B Closed 25
003 C Active 30
我想创建另一个名为“ ActiveYears”的列,该列的每个值都是给定活动AccID的最大活动年份,而不管AccTypes如何。预期的输出如下:
AccID AccTypes Status Years ActiveYears Explanations
001 A Closed 5 5 # Status = Closed, we set ActiveYears = Years
001 B Active 15 15 # Status = Active, we select the maximum year of AccID = 001 with active status
001 C Active 10 15 # Status = Active, we select the maximum year of AccID = 001 with active status
002 A Active 20 20 # Status = Active, we select the maximum year of AccID = 002 with active status
002 B Closed 25 20 # Status = Closed, we set ActiveYears = Years
003 C Active 30 30 # Status = Active, we select the maximum year of AccID = 003 with active status
我可以通过循环来做到这一点,但这还不够优雅。我可以知道比循环更好的方法吗?谢谢你。
您可以使用以下内容:
首先处理身份Closed
:
df.loc[df.Status == 'Closed','ActiveYears'] = df.loc[df.Status == 'Closed','Years']
使用groupby转换来处理主动状态:
df.loc[df.Status == 'Active', 'ActiveYears'] = df[df.Status == 'Active'].groupby('AccID')['Years'].transform(max)
print(df)
AccID AccTypes Status Years ActiveYears
0 001 A Closed 5 5.0
1 001 B Active 15 15.0
2 001 C Active 10 15.0
3 002 A Active 20 20.0
4 002 B Closed 25 25.0
5 003 C Active 30 30.0
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句