Pandas: Delete rows of a DataFrame if total count of a particular column occurs only 1 time

Levine

I'm looking to delete rows of a DataFrame if total count of a particular column occurs only 1 time

Example of raw table (values are arbitrary for illustrative purposes):

print df

     Country     Series          Value
0    Bolivia     Population      123
1    Kenya       Population      1234
2    Ukraine     Population      12345
3    US          Population      123456
5    Bolivia     GDP             23456
6    Kenya       GDP             234567
7    Ukraine     GDP             2345678
8    US          GDP             23456789
9    Bolivia     #McDonalds      3456
10   Kenya       #Schools        3455
11   Ukraine     #Cars           3456
12   US          #Tshirts        3456789

Intended outcome:

print df

     Country     Series          Value
0    Bolivia     Population      123
1    Kenya       Population      1234
2    Ukraine     Population      12345
3    US          Population      123456
5    Bolivia     GDP             23456
6    Kenya       GDP             234567
7    Ukraine     GDP             2345678
8    US          GDP             23456789

I know that df.Series.value_counts()>1 will identify which df.Series occur more than 1 time; and that the code returned will look something like the following:

     Population 
           True
     GDP
           True
     #McDonalds
          False
     #Schools
          False
     #Cars
          False
     #Tshirts
          False

I want to write something like the following so that my new DataFrame drops column values from df.Series that occur only 1 time, but this doesn't work: df.drop(df.Series.value_counts()==1,axis=1,inplace=True)

Gustavo Bezerra

You can do this by creating a boolean list/array by either list comprehensions or using DataFrame's string manipulation methods.

The list comprehension approach is:

vc = df['Series'].value_counts()
u  = [i not in set(vc[vc==1].index) for i in df['Series']]
df = df[u]

The other approach is to use the str.contains method to check whether the values of the Series column contain a given string or match a given regular expression (used in this case as we are using multiple strings):

vc  = df['Series'].value_counts()
pat = r'|'.join(vc[vc==1].index)          #Regular expression
df  = df[~df['Series'].str.contains(pat)] #Tilde is to negate boolean

Using this regular expressions approach is a bit more hackish and may require some extra processing (character escaping, etc) on pat in case you have regex metacharacters in the strings you want to filter out (which requires some basic regex knowledge). However, it's worth noting this approach is about 4x faster than using the list comprehension approach (tested on the data provided in the question).

As a side note, I recommend avoiding using the word Series as a column name as that's the name of a pandas object.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Pandas: Delete rows of a DataFrame if total count of a particular column occurs only 1 time

From Dev

shift particular rows of a particular column of pandas dataframe

From Dev

Pandas DataFrame - delete rows that have same value at a particular column as a previous row

From Dev

MySQL - Count rows with equal values but only if name occurs in other column

From Dev

How to filter rows that fall within 1st and 3rd quartile of a particular column in pandas dataframe?

From Dev

How can I delete rows for a particular Date in a Pandas dataframe?

From Dev

How to update all rows in particular column of pandas dataframe in python?

From Dev

How to find rows with column values having a particular datatype in a Pandas DATAFRAME

From Dev

In a pandas column, how to find the max number of consecutive rows that a particular value occurs?

From Dev

Delete Rows in pandas dataframe

From Dev

Pandas DataFrame naming only 1 column

From Java

count the frequency that a value occurs in a dataframe column

From Dev

Delete rows if there are null values in a specific column in Pandas dataframe

From Dev

count the number of rows with a particular column value

From Dev

Count of rows for a particular column and display with the last row

From Dev

How to count the number of times a item/value from a particular column is repeated in another/other column of a pandas dataframe?

From Java

Count number of times each item in list occurs in a pandas dataframe column with comma separates vales

From Dev

Delete rows containing particular pattern [Python/Pandas]

From Dev

Delete files 100 at a time and count total files

From Dev

With Pandas in Python, select only the rows where group by group count is 1

From Dev

Keeping 3 rows for particular values in column of dataframe

From Dev

Keeping 3 rows for particular values in column of dataframe

From Java

Update all the rows in a particular column PANDAS

From Dev

How to calculate total time difference for rows with same name using a pandas dataframe?

From Dev

How to get particular Column of DataFrame in pandas?

From Dev

Create a column with particular value in pandas DataFrame

From Dev

Repeating columns based on corresponding column values and rows based on total values in pandas dataframe

From Java

Delete column from pandas DataFrame

From Java

Count number of times each item in list occurs in a pandas dataframe column with comma separates values with additional aggregation of other columns

Related Related

  1. 1

    Pandas: Delete rows of a DataFrame if total count of a particular column occurs only 1 time

  2. 2

    shift particular rows of a particular column of pandas dataframe

  3. 3

    Pandas DataFrame - delete rows that have same value at a particular column as a previous row

  4. 4

    MySQL - Count rows with equal values but only if name occurs in other column

  5. 5

    How to filter rows that fall within 1st and 3rd quartile of a particular column in pandas dataframe?

  6. 6

    How can I delete rows for a particular Date in a Pandas dataframe?

  7. 7

    How to update all rows in particular column of pandas dataframe in python?

  8. 8

    How to find rows with column values having a particular datatype in a Pandas DATAFRAME

  9. 9

    In a pandas column, how to find the max number of consecutive rows that a particular value occurs?

  10. 10

    Delete Rows in pandas dataframe

  11. 11

    Pandas DataFrame naming only 1 column

  12. 12

    count the frequency that a value occurs in a dataframe column

  13. 13

    Delete rows if there are null values in a specific column in Pandas dataframe

  14. 14

    count the number of rows with a particular column value

  15. 15

    Count of rows for a particular column and display with the last row

  16. 16

    How to count the number of times a item/value from a particular column is repeated in another/other column of a pandas dataframe?

  17. 17

    Count number of times each item in list occurs in a pandas dataframe column with comma separates vales

  18. 18

    Delete rows containing particular pattern [Python/Pandas]

  19. 19

    Delete files 100 at a time and count total files

  20. 20

    With Pandas in Python, select only the rows where group by group count is 1

  21. 21

    Keeping 3 rows for particular values in column of dataframe

  22. 22

    Keeping 3 rows for particular values in column of dataframe

  23. 23

    Update all the rows in a particular column PANDAS

  24. 24

    How to calculate total time difference for rows with same name using a pandas dataframe?

  25. 25

    How to get particular Column of DataFrame in pandas?

  26. 26

    Create a column with particular value in pandas DataFrame

  27. 27

    Repeating columns based on corresponding column values and rows based on total values in pandas dataframe

  28. 28

    Delete column from pandas DataFrame

  29. 29

    Count number of times each item in list occurs in a pandas dataframe column with comma separates values with additional aggregation of other columns

HotTag

Archive