如何在同一列中找到连续出现的最高计数,例如相同数目,增加值或减少值。
所以给这样的东西:
h_diff l_diff monotonic
timestamp
2000-01-18 NaN NaN NaN
2000-01-19 2.75 2.93 1.0
2000-01-20 12.75 10.13 1.0
2000-01-21 -7.25 -3.31 0.0
2000-01-24 -1.50 -5.07 0.0
2000-01-25 0.37 -2.75 1.0
2000-01-26 1.07 7.38 1.0
2000-01-27 -1.19 -2.75 0.0
2000-01-28 -2.13 -6.38 0.0
2000-01-31 -7.00 -6.12 0.0
h_diff中正值的单调性最高值为2,负值中的单调性最高值为3。所以给定滚动10或n,我如何找到最高的单调计数,同时仍然能够动态更改窗口大小?
这给我单调列的1.0值:lambda x:np.all(np.diff(x)> 0)和lambda x:np.count_nonzero(np.diff(x)> 0)将计算的总计数1.0的窗口,但我想找到的是给定窗口系列中运行时间最长的窗口。
我希望是这样的:
h_diff l_diff monotonic
timestamp
2000-01-18 NaN NaN NaN
2000-01-19 2.75 2.93 1.0
2000-01-20 12.75 10.13 2.0
2000-01-21 -7.25 -3.31 0.0
2000-01-24 -1.50 -5.07 0.0
2000-01-25 0.37 -2.75 1.0
2000-01-26 1.07 7.38 2.0
2000-01-27 1.19 -2.75 3.0
2000-01-28 2.13 -6.38 4.0
2000-01-31 -7.00 -6.12 0.0
使用GroupBy.cumcount
+ Series.where
。
初始数据框
h_diff l_diff
timestamp
2000-01-18 NaN NaN
2000-01-19 2.75 2.93
2000-01-20 12.75 10.13
2000-01-21 -7.25 -3.31
2000-01-24 -1.50 -5.07
2000-01-25 0.37 -2.75
2000-01-26 1.07 7.38
2000-01-27 1.19 -2.75
2000-01-28 2.13 -6.38
2000-01-31 -7.00 -6.12
h = df['h_diff'].gt(0)
#h = np.sign(df['h_diff'])
df['monotonic_h']=h.groupby(h.ne(h.shift()).cumsum()).cumcount().add(1).where(h,0)
print(df)
h_diff l_diff monotonic_h
timestamp
2000-01-18 NaN NaN 0
2000-01-19 2.75 2.93 1
2000-01-20 12.75 10.13 2
2000-01-21 -7.25 -3.31 0
2000-01-24 -1.50 -5.07 0
2000-01-25 0.37 -2.75 1
2000-01-26 1.07 7.38 2
2000-01-27 1.19 -2.75 3
2000-01-28 2.13 -6.38 4
2000-01-31 -7.00 -6.12 0
df['monotonic_h'].max()
#4
详情
h.ne(h.shift()).cumsum()
timestamp
2000-01-18 1
2000-01-19 2
2000-01-20 2
2000-01-21 3
2000-01-24 3
2000-01-25 4
2000-01-26 4
2000-01-27 4
2000-01-28 4
2000-01-31 5
Name: h_diff, dtype: int64
更新
df = df.join( h.groupby(h.ne(h.shift()).cumsum()).cumcount().add(1)
.to_frame('values')
.assign(monotic = np.where(h,'monotic_h_greater_0',
'monotic_h_not_greater_0'),
index = lambda x: x.index)
.where(df['h_diff'].notna())
.pivot_table(columns = 'monotic',
index = 'index',
values = 'values',
fill_value=0) )
print(df)
h_diff l_diff monotic_h_greater_0 monotic_h_not_greater_0
timestamp
2000-01-18 NaN NaN NaN NaN
2000-01-19 2.75 2.93 1.0 0.0
2000-01-20 12.75 10.13 2.0 0.0
2000-01-21 -7.25 -3.31 0.0 1.0
2000-01-24 -1.50 -5.07 0.0 2.0
2000-01-25 0.37 -2.75 1.0 0.0
2000-01-26 1.07 7.38 2.0 0.0
2000-01-27 1.19 -2.75 3.0 0.0
2000-01-28 2.13 -6.38 4.0 0.0
2000-01-31 -7.00 -6.12 0.0 1.0
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句