带有窗口函数的熊猫累积和

mbh86

我正在使用tipsseaborn的数据集

import pandas as pd
import seaborn as sns
tips = sns.load_dataset("tips")
tips['rowid'] = tips.index

我想创建一个列,该列将是累计过滤了3个以上且带有过滤条件的男性和晚餐的人的总数。该计数不应包含当前行(1 PRECEDING在以下查询中为cf )。

SQL等效项为:

SELECT *, 
    SUM(CASE WHEN tip >= 3 AND sex='male' AND time='Dinner' THEN 1 ELSE NULL END) 
            OVER (PARTITION BY sex, time ORDER BY rowid ROWS BETWEEN unbounded PRECEDING AND 1 PRECEDING) as cnt
FROM tips
ORDER BY rowid ;

如何在Pandas上实现相同目标?从我读到的内容来看,我可能会使用一些滚动和转换功能,但是我并没有成功。

最终数据帧应满足以下条件:

在此处输入图片说明

编辑:ansev请求的数据帧的切片

    total_bill  tip sex smoker  day time    size    rowid   cnt
index                                   
0   16.99   1.01    Female  No  Sun Dinner  2   0   NaN
1   10.34   1.66    Male    No  Sun Dinner  3   1   NaN
2   21.01   3.50    Male    No  Sun Dinner  3   2   NaN
3   23.68   3.31    Male    No  Sun Dinner  2   3   1.0
4   24.59   3.61    Female  No  Sun Dinner  4   4   NaN
5   25.29   4.71    Male    No  Sun Dinner  4   5   2.0
6   8.77    2.00    Male    No  Sun Dinner  2   6   3.0
7   26.88   3.12    Male    No  Sun Dinner  4   7   3.0
8   15.04   1.96    Male    No  Sun Dinner  2   8   4.0
9   14.78   3.23    Male    No  Sun Dinner  2   9   4.0
10  10.27   1.71    Male    No  Sun Dinner  2   10  5.0
11  35.26   5.00    Female  No  Sun Dinner  4   11  NaN
12  15.42   1.57    Male    No  Sun Dinner  2   12  5.0
13  18.43   3.00    Male    No  Sun Dinner  4   13  5.0
14  14.83   3.02    Female  No  Sun Dinner  2   14  NaN
15  21.58   3.92    Male    No  Sun Dinner  2   15  6.0
16  10.33   1.67    Female  No  Sun Dinner  3   16  NaN
17  16.29   3.71    Male    No  Sun Dinner  3   17  7.0
18  16.97   3.50    Female  No  Sun Dinner  3   18  NaN
19  20.65   3.35    Male    No  Sat Dinner  3   19  8.0
20  17.92   4.08    Male    No  Sat Dinner  2   20  9.0
21  20.29   2.75    Female  No  Sat Dinner  2   21  NaN
22  15.77   2.23    Female  No  Sat Dinner  2   22  NaN
23  39.42   7.58    Male    No  Sat Dinner  4   23  10.0
24  19.82   3.18    Male    No  Sat Dinner  2   24  11.0
25  17.81   2.34    Male    No  Sat Dinner  4   25  12.0
26  13.37   2.00    Male    No  Sat Dinner  2   26  12.0
27  12.69   2.00    Male    No  Sat Dinner  2   27  12.0
28  21.70   4.30    Male    No  Sat Dinner  2   28  12.0
29  19.65   3.00    Female  No  Sat Dinner  2   29  NaN
安塞夫

我觉得你需要

df['cnt'] = ( df.loc[df['sex'].eq('Male') & df['time'].eq('Dinner'),'tip']
                .ge(3)
                .cumsum()
                .shift() )

# if not ordered
#df['cnt'] = ( df.sort_values('rowid')
#                .loc[df['sex'].eq('Male') & df['time'].eq('Dinner'),'tip']
#                .ge(3)
#                .cumsum()
#                .shift() )

更新

df['cnt']=( df.loc[df['sex'].eq('Male') & df['time'].eq('Dinner'),'tip']
              .ge(3)
              .cumsum()
              .shift()
              .where(lambda x: x.gt(0))
            )

#       total_bill   tip     sex smoker  day    time  size  rowid   cnt
#index                                                                 
#0           16.99  1.01  Female     No  Sun  Dinner     2      0   NaN
#1           10.34  1.66    Male     No  Sun  Dinner     3      1   NaN
#2           21.01  3.50    Male     No  Sun  Dinner     3      2   NaN
#3           23.68  3.31    Male     No  Sun  Dinner     2      3   1.0
#4           24.59  3.61  Female     No  Sun  Dinner     4      4   NaN
#5           25.29  4.71    Male     No  Sun  Dinner     4      5   2.0
#6            8.77  2.00    Male     No  Sun  Dinner     2      6   3.0
#7           26.88  3.12    Male     No  Sun  Dinner     4      7   3.0
#8           15.04  1.96    Male     No  Sun  Dinner     2      8   4.0
#9           14.78  3.23    Male     No  Sun  Dinner     2      9   4.0
#10          10.27  1.71    Male     No  Sun  Dinner     2     10   5.0
#11          35.26  5.00  Female     No  Sun  Dinner     4     11   NaN
#12          15.42  1.57    Male     No  Sun  Dinner     2     12   5.0
#13          18.43  3.00    Male     No  Sun  Dinner     4     13   5.0
#14          14.83  3.02  Female     No  Sun  Dinner     2     14   NaN
#15          21.58  3.92    Male     No  Sun  Dinner     2     15   6.0
#16          10.33  1.67  Female     No  Sun  Dinner     3     16   NaN
#17          16.29  3.71    Male     No  Sun  Dinner     3     17   7.0
#18          16.97  3.50  Female     No  Sun  Dinner     3     18   NaN
#19          20.65  3.35    Male     No  Sat  Dinner     3     19   8.0
#20          17.92  4.08    Male     No  Sat  Dinner     2     20   9.0
#21          20.29  2.75  Female     No  Sat  Dinner     2     21   NaN
#22          15.77  2.23  Female     No  Sat  Dinner     2     22   NaN
#23          39.42  7.58    Male     No  Sat  Dinner     4     23  10.0
#24          19.82  3.18    Male     No  Sat  Dinner     2     24  11.0
#25          17.81  2.34    Male     No  Sat  Dinner     4     25  12.0
#26          13.37  2.00    Male     No  Sat  Dinner     2     26  12.0
#27          12.69  2.00    Male     No  Sat  Dinner     2     27  12.0
#28          21.70  4.30    Male     No  Sat  Dinner     2     28  12.0
#29          19.65  3.00  Female     No  Sat  Dinner     2     29   NaN

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

具有日期和NaT的系列的熊猫累积函数

来自分类Dev

熊猫数据框上的累积和函数

来自分类Dev

结果馈送滚动窗口或带有累积量的滚动应用

来自分类Dev

在ggplot中绘制带有负向和正向的累积直方图?

来自分类Dev

带有 SUM 和子查询的 MYSQL 累积总数

来自分类常见问题

带有日期滚动总和的窗口函数

来自分类Dev

带有日期滚动总和的窗口函数

来自分类Dev

带有LIMIT的PostgreSQL窗口函数

来自分类Dev

带有 COUNT(DISTINCT) 的 mySQL 窗口函数

来自分类Dev

在熊猫中应用带有args的函数

来自分类Dev

熊猫面板数据-返回带有年份差距的滚动累积总和

来自分类Dev

带有附件的列表累积

来自分类Dev

带有窗口和光标的黑屏

来自分类Dev

带有 && 和 || 的 IF 函数 同时

来自分类Dev

校正Timedelta列上的熊猫累积和

来自分类Dev

有关系的熊猫的累积价值排名

来自分类Dev

带有窗口函数的外部联接T-SQL

来自分类Dev

传递带有和不带有&的函数的指针

来自分类Dev

带有和不带有括号的python pandas函数

来自分类Dev

R中带有NA的累积收益

来自分类Dev

带有jQuery和JavaScript函数的onkeydown

来自分类Dev

带有lodash的filterfirst和filterlast函数

来自分类Dev

带有对象的最小和最大函数

来自分类Dev

带有lodash的filterfirst和filterlast函数

来自分类Dev

带有 `Val` 和 `Def` 的高阶函数

来自分类Dev

带有聚合函数的 COALESCE 和 NULLIF

来自分类Dev

带有函数和 this 的 Javascript 对象

来自分类Dev

较短和更有效的熊猫代码,用于基于累积的数据选择和基于列的数据选择

来自分类Dev

打开和关闭带有关闭窗口