Pandas: drop rows based on duplicated values in a list

Mike

I would like to drop rows within my dataframe based on if a piece of a string is duplicated within that string. For example, if the string is jkl-ghi-jkl, I would drop this row because jkl is repeated twice. I figured that creating a list and checking the list for duplicates would be the ideal approach.

My dataframe for this example consist of 1 column and two data points:

    df1 = pd.DataFrame({'Col1' : ['abc-def-ghi-jkl', 'jkl-ghi-jkl-mno'],})

My first step I take is to apply a split to my data, and split of "-"

    List = df1['Col1].str.split('-')
    List

Which is yields the output:

    0     [abc, def, ghi, jkl]
    1     [jkl, ghi, jkl, mno]
    Name: Col1, dtype: object

My second step I take is to convert my output into lists:

    List = List.tolist()

Which yields:

    [['abc', 'def', 'ghi', 'jkl'], ['jkl', 'ghi', 'jkl', 'mno']]

My last step I wish to accomplish is to compare a full list with a distinct list of unique values:

    len(List) > len(set(List))

Which yields the error:

    TypeError: unhashable type: 'list'

I am aware that my .tolist() creates a list of 2 series. Is there a way to convert these series into a list in order to test for duplicates? I wish to use this piece of code:

    len(List) > len(set(List)

with a drop in order to drop all rows with a duplicated value within each cell.

Is this the correct way of approaching, or is there a simpler way?

My end output should look like:

     Col1
     abc-def-ghi-jkl

Because string jkl-ghi-jkl-mno gets dropped due to "jkl" repeating twice

Luis

Here is another option, using set and len:

df1 = pd.DataFrame({'Col1' : ['abc-def-ghi-jkl', 'jkl-ghi-jkl-mno'],})

df1['length'] = df1['Col1'].str.split('-').apply(set).apply(len)

print( df1 )

              Col1  length
0  abc-def-ghi-jkl       4
1  jkl-ghi-jkl-mno       3

df1 = df1.loc[ df1['length'] < 4 ]

print(df1)

              Col1  length
1  jkl-ghi-jkl-mno       3

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Drop rows in pandas conditionally based on values across all columns

From Java

Update rows in Pandas Dataframe based on the list values

From Dev

Drop duplicate rows in dataframe based on multplie columns with list values

From Dev

Select rows of data frame based on a vector with duplicated values

From Dev

Remove duplicated rows based on column values using awk or sed

From Dev

Select rows of data frame based on a vector with duplicated values

From Dev

Removing one of duplicated rows in data frame based on character values in a column

From Dev

pandas subset and drop rows based on column value

From Dev

Drop rows in pandas dataframe based on columns value

From Dev

Drop pandas dataframe rows based on groupby() condition

From Dev

Viewing duplicated rows in Pandas

From Dev

Selecting rows - based on a list - from a DF with duplicated columns

From Dev

Drop rows in pandas where all values are the same

From Dev

Pandas duplicated rows to unique rows

From Java

How to drop a list of rows from Pandas dataframe?

From Dev

Ignore duplicated values in pandas

From Java

List with duplicated values and suffix

From Dev

List with duplicated values and suffix

From Dev

Pandas - Drop row from list of values

From Dev

How to Drop All The Rows Based on Multiple Values Found in the "Fruit "Column?

From Dev

How to drop the rows based on column values whose occurrence is least

From Dev

How to Drop All The Rows Based on Multiple Values Found in the "Fruit "Column?

From Dev

Fastest Way to Drop Duplicated Index in a Pandas DataFrame

From Java

Drop rows on multiple conditions (based on 2 column) in pandas dataframe

From Dev

Drop some Pandas dataframe rows using group based condition

From Dev

PANDAS DROP ROWS based on filtered items, my solution - not satisfied

From Dev

Drop rows within dataframe based on condition pandas python

From Dev

Drop some Pandas dataframe rows using group based condition

From Dev

Python Pandas drop rows in one df based other df

Related Related

  1. 1

    Drop rows in pandas conditionally based on values across all columns

  2. 2

    Update rows in Pandas Dataframe based on the list values

  3. 3

    Drop duplicate rows in dataframe based on multplie columns with list values

  4. 4

    Select rows of data frame based on a vector with duplicated values

  5. 5

    Remove duplicated rows based on column values using awk or sed

  6. 6

    Select rows of data frame based on a vector with duplicated values

  7. 7

    Removing one of duplicated rows in data frame based on character values in a column

  8. 8

    pandas subset and drop rows based on column value

  9. 9

    Drop rows in pandas dataframe based on columns value

  10. 10

    Drop pandas dataframe rows based on groupby() condition

  11. 11

    Viewing duplicated rows in Pandas

  12. 12

    Selecting rows - based on a list - from a DF with duplicated columns

  13. 13

    Drop rows in pandas where all values are the same

  14. 14

    Pandas duplicated rows to unique rows

  15. 15

    How to drop a list of rows from Pandas dataframe?

  16. 16

    Ignore duplicated values in pandas

  17. 17

    List with duplicated values and suffix

  18. 18

    List with duplicated values and suffix

  19. 19

    Pandas - Drop row from list of values

  20. 20

    How to Drop All The Rows Based on Multiple Values Found in the "Fruit "Column?

  21. 21

    How to drop the rows based on column values whose occurrence is least

  22. 22

    How to Drop All The Rows Based on Multiple Values Found in the "Fruit "Column?

  23. 23

    Fastest Way to Drop Duplicated Index in a Pandas DataFrame

  24. 24

    Drop rows on multiple conditions (based on 2 column) in pandas dataframe

  25. 25

    Drop some Pandas dataframe rows using group based condition

  26. 26

    PANDAS DROP ROWS based on filtered items, my solution - not satisfied

  27. 27

    Drop rows within dataframe based on condition pandas python

  28. 28

    Drop some Pandas dataframe rows using group based condition

  29. 29

    Python Pandas drop rows in one df based other df

HotTag

Archive