筛选每组n个最大值的行

安德斯·斯旺森(Anders Swanson)

语境

我想为每个团队提供包含前三名得分手的数据框行。

在我的头,它是一个组合Dataframe.nlargest()Dataframe.groupby(),但我不认为这是支持的。我理想的解决方案是:

  • 直接执行,df而无需创建其他数据框
  • 清晰,并且
  • 性能相对较好(实际df形状为7M行和5列)

输入值

import pandas as pd
df = pd.read_json('{"team":{"0":"A","1":"A","2":"A","3":"A","4":"A","5":"B","6":"B","7":"B","8":"B","9":"B","10":"C","11":"C","12":"C","13":"C","14":"C"},"player":{"0":"Alice","1":"Becky","2":"Carmen","3":"Donna","4":"Elizabeth","5":"Fran","6":"Greta","7":"Heather","8":"Iris","9":"Jackie","10":"Kelly","11":"Lucy","12":"Molly","13":"Nina","14":"Ophelia"},"points":{"0":15,"1":11,"2":13,"3":8,"4":10,"5":28,"6":29,"7":18,"8":25,"9":9,"10":12,"11":23,"12":18,"13":10,"14":15}}')
| team | player    | points |
|------|-----------|--------|
| A    | Alice     | 15     |
| A    | Becky     | 11     |
| A    | Carmen    | 13     |
| A    | Donna     | 8      |
| A    | Elizabeth | 10     |
| B    | Fran      | 28     |
| B    | Greta     | 29     |
| B    | Heather   | 18     |
| B    | Iris      | 25     |
| B    | Jackie    | 9      |
| C    | Kelly     | 12     |
| C    | Lucy      | 23     |
| C    | Molly     | 18     |
| C    | Nina      | 10     |
| C    | Ophelia   | 15     |

期望的输出

df_output = pd.read_json('{"team":{"0":"A","1":"A","2":"A","3":"B","4":"B","5":"B","6":"C","7":"C","8":"C"},"player":{"0":"Alice","1":"Becky","2":"Carmen","3":"Fran","4":"Greta","5":"Iris","6":"Lucy","7":"Molly","8":"Ophelia"},"points":{"0":15,"1":11,"2":13,"3":28,"4":29,"5":25,"6":23,"7":18,"8":15}}')
df_output
| team | player  | points |
|------|---------|--------|
| A    | Alice   | 15     |
| A    | Becky   | 11     |
| A    | Carmen  | 13     |
| B    | Fran    | 28     |
| B    | Greta   | 29     |
| B    | Iris    | 25     |
| C    | Lucy    | 23     |
| C    | Molly   | 18     |
| C    | Ophelia | 15     |
Mayank porwal

您可以使用df.groupby.rank方法:

In [1401]: df[df.groupby('team')['points'].rank(ascending=False) <= 3]
Out[1401]: 
   team   player  points
0     A    Alice      15
1     A    Becky      11
2     A   Carmen      13
5     B     Fran      28
6     B    Greta      29
8     B     Iris      25
11    C     Lucy      23
12    C    Molly      18
14    C  Ophelia      15

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

返回每组最大值的行

来自分类Dev

每组获得一个最大值

来自分类Dev

多索引数据框删除行,每组最大值

来自分类Dev

从BST打印n个最大值

来自分类Dev

每行n个最大值

来自分类Dev

SQL将一个表连接到第二个表中包含每组最大值的行的选择

来自分类Dev

SQL Server:每组第二个最大值

来自分类Dev

SQL查询以选择每组最大值的每一行

来自分类Dev

每组的最小值和最大值

来自分类Dev

每组保持顺序的最小值和最大值

来自分类Dev

找出每组内的最大值和最小值

来自分类Dev

每组返回单个记录,最大值

来自分类Dev

如何获取每组最大值的所有记录

来自分类Dev

在二维numpy数组的每一行中找到N个最大值

来自分类Dev

数字熊猫DataFrame中n个最大值的(行,列)列表?

来自分类Dev

SQL仅按列筛选的列上具有最大值的行

来自分类Dev

返回行中3个最大值的列名称

来自分类Dev

获取Java数组中n个最大值的索引

来自分类Dev

MySQL:选择前n个最大值?

来自分类常见问题

如何每月仅获取n个最大值

来自分类Dev

在oneliner中选择数组中的n个最大值

来自分类Dev

ndarray每行中的N个最大值

来自分类Dev

Python从字典中获取N个最大值

来自分类Dev

如何每月仅获取n个最大值

来自分类Dev

范围中第N个最大值的索引

来自分类Dev

从数组中筛选最大值并展平文档

来自分类Dev

筛选出空字典键以计算最大值

来自分类Dev

筛选出空字典键以计算最大值

来自分类Dev

猪:每组获取前n个值