我正在使用tips
seaborn的数据集:
import pandas as pd
import seaborn as sns
tips = sns.load_dataset("tips")
tips['rowid'] = tips.index
我想创建一个列,该列将是累计过滤了3个以上且带有过滤条件的男性和晚餐的人的总数。该计数不应包含当前行(1 PRECEDING
在以下查询中为cf )。
SQL等效项为:
SELECT *,
SUM(CASE WHEN tip >= 3 AND sex='male' AND time='Dinner' THEN 1 ELSE NULL END)
OVER (PARTITION BY sex, time ORDER BY rowid ROWS BETWEEN unbounded PRECEDING AND 1 PRECEDING) as cnt
FROM tips
ORDER BY rowid ;
如何在Pandas上实现相同目标?从我读到的内容来看,我可能会使用一些滚动和转换功能,但是我并没有成功。
最终数据帧应满足以下条件:
编辑:ansev请求的数据帧的切片
total_bill tip sex smoker day time size rowid cnt
index
0 16.99 1.01 Female No Sun Dinner 2 0 NaN
1 10.34 1.66 Male No Sun Dinner 3 1 NaN
2 21.01 3.50 Male No Sun Dinner 3 2 NaN
3 23.68 3.31 Male No Sun Dinner 2 3 1.0
4 24.59 3.61 Female No Sun Dinner 4 4 NaN
5 25.29 4.71 Male No Sun Dinner 4 5 2.0
6 8.77 2.00 Male No Sun Dinner 2 6 3.0
7 26.88 3.12 Male No Sun Dinner 4 7 3.0
8 15.04 1.96 Male No Sun Dinner 2 8 4.0
9 14.78 3.23 Male No Sun Dinner 2 9 4.0
10 10.27 1.71 Male No Sun Dinner 2 10 5.0
11 35.26 5.00 Female No Sun Dinner 4 11 NaN
12 15.42 1.57 Male No Sun Dinner 2 12 5.0
13 18.43 3.00 Male No Sun Dinner 4 13 5.0
14 14.83 3.02 Female No Sun Dinner 2 14 NaN
15 21.58 3.92 Male No Sun Dinner 2 15 6.0
16 10.33 1.67 Female No Sun Dinner 3 16 NaN
17 16.29 3.71 Male No Sun Dinner 3 17 7.0
18 16.97 3.50 Female No Sun Dinner 3 18 NaN
19 20.65 3.35 Male No Sat Dinner 3 19 8.0
20 17.92 4.08 Male No Sat Dinner 2 20 9.0
21 20.29 2.75 Female No Sat Dinner 2 21 NaN
22 15.77 2.23 Female No Sat Dinner 2 22 NaN
23 39.42 7.58 Male No Sat Dinner 4 23 10.0
24 19.82 3.18 Male No Sat Dinner 2 24 11.0
25 17.81 2.34 Male No Sat Dinner 4 25 12.0
26 13.37 2.00 Male No Sat Dinner 2 26 12.0
27 12.69 2.00 Male No Sat Dinner 2 27 12.0
28 21.70 4.30 Male No Sat Dinner 2 28 12.0
29 19.65 3.00 Female No Sat Dinner 2 29 NaN
我觉得你需要
df['cnt'] = ( df.loc[df['sex'].eq('Male') & df['time'].eq('Dinner'),'tip']
.ge(3)
.cumsum()
.shift() )
# if not ordered
#df['cnt'] = ( df.sort_values('rowid')
# .loc[df['sex'].eq('Male') & df['time'].eq('Dinner'),'tip']
# .ge(3)
# .cumsum()
# .shift() )
更新
df['cnt']=( df.loc[df['sex'].eq('Male') & df['time'].eq('Dinner'),'tip']
.ge(3)
.cumsum()
.shift()
.where(lambda x: x.gt(0))
)
# total_bill tip sex smoker day time size rowid cnt
#index
#0 16.99 1.01 Female No Sun Dinner 2 0 NaN
#1 10.34 1.66 Male No Sun Dinner 3 1 NaN
#2 21.01 3.50 Male No Sun Dinner 3 2 NaN
#3 23.68 3.31 Male No Sun Dinner 2 3 1.0
#4 24.59 3.61 Female No Sun Dinner 4 4 NaN
#5 25.29 4.71 Male No Sun Dinner 4 5 2.0
#6 8.77 2.00 Male No Sun Dinner 2 6 3.0
#7 26.88 3.12 Male No Sun Dinner 4 7 3.0
#8 15.04 1.96 Male No Sun Dinner 2 8 4.0
#9 14.78 3.23 Male No Sun Dinner 2 9 4.0
#10 10.27 1.71 Male No Sun Dinner 2 10 5.0
#11 35.26 5.00 Female No Sun Dinner 4 11 NaN
#12 15.42 1.57 Male No Sun Dinner 2 12 5.0
#13 18.43 3.00 Male No Sun Dinner 4 13 5.0
#14 14.83 3.02 Female No Sun Dinner 2 14 NaN
#15 21.58 3.92 Male No Sun Dinner 2 15 6.0
#16 10.33 1.67 Female No Sun Dinner 3 16 NaN
#17 16.29 3.71 Male No Sun Dinner 3 17 7.0
#18 16.97 3.50 Female No Sun Dinner 3 18 NaN
#19 20.65 3.35 Male No Sat Dinner 3 19 8.0
#20 17.92 4.08 Male No Sat Dinner 2 20 9.0
#21 20.29 2.75 Female No Sat Dinner 2 21 NaN
#22 15.77 2.23 Female No Sat Dinner 2 22 NaN
#23 39.42 7.58 Male No Sat Dinner 4 23 10.0
#24 19.82 3.18 Male No Sat Dinner 2 24 11.0
#25 17.81 2.34 Male No Sat Dinner 4 25 12.0
#26 13.37 2.00 Male No Sat Dinner 2 26 12.0
#27 12.69 2.00 Male No Sat Dinner 2 27 12.0
#28 21.70 4.30 Male No Sat Dinner 2 28 12.0
#29 19.65 3.00 Female No Sat Dinner 2 29 NaN
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句