假设我有一个 df
t status
1 ok
2 ok
3 ok
4 closed
5 closed
6 closed
7 bad input
8 bad input
9 closed
10 closed
11 ok
12 ok
13 closed
14 closed
我想确定何时“关闭”出现以及持续多长时间。
所以结果应该是
t status index
1 ok 0
2 ok 0
3 ok 0
4 closed 1
5 closed 1
6 closed 1
7 bad input 0
8 bad input 0
9 closed 2
10 closed 2
11 ok 0
12 ok 0
13 closed 3
14 closed 3
我尝试了标准的“for 循环”方法,但对于大型数据框不可行。我正在考虑使用 numpy where 并重复的解决方案
np.where(tmp['status']=='Closed', 1, 0)
每次“关闭”重新出现时,我都坚持添加 1
IIUC我们使用shift
cumsum
创造条件
df['New']=0
df.loc[df.status=='closed','New']=(df.status.eq('closed')&df.status.shift().ne('closed')).cumsum()
df
t status New
0 1 ok 0
1 2 ok 0
2 3 ok 0
3 4 closed 1
4 5 closed 1
5 6 closed 1
6 7 badinput 0
7 8 badinput 0
8 9 closed 2
9 10 closed 2
10 11 ok 0
11 12 ok 0
12 13 closed 3
13 14 closed 3
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句