How to delete minority rows from a pandas data frame?

Roman

I have a data frame. For all possible combination of values in the first two columns, I would like to delete those rows for which the number of rows smaller than 100.

For example, there are 5 rows in which in the first column we have "A" and "B" in the second column. All these rows I would like to delete from the data frame. There are 110 rows in which the first and the second rows contains "C" and "D", respectively. These rows I do not want to delete since 110 > 5.

What is the most elegant and fast way to do that?

This is the solution that I have at the moment:

gr = df.groupby(['L_ID', 'P_ID'])
for group in gr.groups:
    df_tmp = gr.get_group(group)
    n_vals = len(df_tmp)
    if n_vals < min_n:
        df = df[(df['L_ID'] != group[0]) | (df['P_ID'] != group[1])]
Roman Pekar

You can use filter() method:

# test data
>>> df1 = pd.DataFrame({'a':list('AAABB'), 'b':list('BBBAA'), 'c':range(5)})
>>> df1
   a  b  c
0  A  B  0
1  A  B  1
2  A  B  2
3  B  A  3
4  B  A  4

>>> df1.groupby(['a','b']).filter(lambda x: len(x) > 2)
   a  b  c
0  A  B  0
1  A  B  1
2  A  B  2

update

Looks like this method is not working when there're more columns:

>>> df1 = pd.DataFrame({'a':list('AAABB'), 'b':list('BBBAA'), 'c':range(5), 'd':range(5)})
>>> df1
   a  b  c  d
0  A  B  0  0
1  A  B  1  1
2  A  B  2  2
3  B  A  3  3
4  B  A  4  4
>>> df1.groupby(['a','b']).filter(lambda x: len(x) > 2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 2094, in filter
    if res:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Here's a solution:

>>> df1.groupby(['a','b']).filter(lambda x: len(x['c']) > 2)
   a  b  c  d
0  A  B  0  0
1  A  B  1  1
2  A  B  2  2

You can also use transform():

>>> df1[df1.groupby(['a','b'])['c'].transform(lambda x: len(x) > 2).astype(bool)]
   a  b  c  d
0  A  B  0  0
1  A  B  1  1
2  A  B  2  2

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

Removing duplicates from pandas data frame with condition based on another column

来自分类Dev

Python Pandas: Delete duplicate rows based on one column and concatenate information from multiple columns

来自分类Dev

Group rows in data frame based on time difference between consecutive rows

来自分类Dev

Pandas Data Frame to CSV 问题与新行

来自分类Dev

Select all rows up to and including first occurrence by group in a data frame

来自分类Dev

Get data frame from character variable

来自分类Dev

Building a data frame row by row from a list

来自分类Dev

如何在python的pandas模块中获得与Data.Frame相同的data.frame?

来自分类Dev

How to convert matrix (or list) into data.frame

来自分类Dev

使用索引值访问Pandas Data Frame行

来自分类Dev

data.frame.describe()中的“唯一”无效[python] [pandas]

来自分类Dev

如何处理Pandas Data Frame中的重复条目?

来自分类Dev

将 Pandas Data Frame 列建模为类别列表

来自分类Dev

具有多列的 Pandas Data Frame 条件流

来自分类Dev

r 或 python pandas 中 data.frame 中的顺序减法

来自分类Dev

如何用负 1 替换 Pandas Data Frame 中的零

来自分类Dev

如何使用 Pandas Data Frame 在 python 3 中使用断言?

来自分类Dev

How to parse the data from "rows" object in node.js,express.js,mysql2

来自分类Dev

使用bind_rows / data.frame时将列表元素的名称设置为列

来自分类Dev

如何在`bind_rows`引入的data.frame中删除冗余列?

来自分类Dev

Extract original and duplicate result(s) from a data frame in R

来自分类Dev

Python Pandas groupby并将结果返回到原始Pandas Data Frame

来自分类Dev

How to delete a file from local disk in UWP

来自分类Dev

Pandas plot from csv data fails

来自分类Dev

Angular delete obj data from array (issue with method)

来自分类Dev

Remove rows conditionally from a data.table in R

来自分类常见问题

使用else if逻辑将条件列添加到Pandas Data Frame-Python

来自分类Dev

从pandas的Data-frame列中搜索字符串模式

来自分类Dev

使用相同的列索引将数组列表作为列追加到pandas Data Frame

Related 相关文章

  1. 1

    Removing duplicates from pandas data frame with condition based on another column

  2. 2

    Python Pandas: Delete duplicate rows based on one column and concatenate information from multiple columns

  3. 3

    Group rows in data frame based on time difference between consecutive rows

  4. 4

    Pandas Data Frame to CSV 问题与新行

  5. 5

    Select all rows up to and including first occurrence by group in a data frame

  6. 6

    Get data frame from character variable

  7. 7

    Building a data frame row by row from a list

  8. 8

    如何在python的pandas模块中获得与Data.Frame相同的data.frame?

  9. 9

    How to convert matrix (or list) into data.frame

  10. 10

    使用索引值访问Pandas Data Frame行

  11. 11

    data.frame.describe()中的“唯一”无效[python] [pandas]

  12. 12

    如何处理Pandas Data Frame中的重复条目?

  13. 13

    将 Pandas Data Frame 列建模为类别列表

  14. 14

    具有多列的 Pandas Data Frame 条件流

  15. 15

    r 或 python pandas 中 data.frame 中的顺序减法

  16. 16

    如何用负 1 替换 Pandas Data Frame 中的零

  17. 17

    如何使用 Pandas Data Frame 在 python 3 中使用断言?

  18. 18

    How to parse the data from "rows" object in node.js,express.js,mysql2

  19. 19

    使用bind_rows / data.frame时将列表元素的名称设置为列

  20. 20

    如何在`bind_rows`引入的data.frame中删除冗余列?

  21. 21

    Extract original and duplicate result(s) from a data frame in R

  22. 22

    Python Pandas groupby并将结果返回到原始Pandas Data Frame

  23. 23

    How to delete a file from local disk in UWP

  24. 24

    Pandas plot from csv data fails

  25. 25

    Angular delete obj data from array (issue with method)

  26. 26

    Remove rows conditionally from a data.table in R

  27. 27

    使用else if逻辑将条件列添加到Pandas Data Frame-Python

  28. 28

    从pandas的Data-frame列中搜索字符串模式

  29. 29

    使用相同的列索引将数组列表作为列追加到pandas Data Frame

热门标签

归档