How to find duplicates in pandas dataframe

sa_zy

Editing.

Suppose I have the following series in pandas:

>>>p
0     0.0
1     0.0
2     0.0
3     0.3
4     0.3
5     0.3
6     0.3
7     0.3
8     1.0
9     1.0
10    1.0
11    0.2
12    0.2
13    0.3
14    0.3
15    0.3

I need to identify each sequence of consecutive duplicates - its first and last index. Using the above example, I need to identify the first sequence of 0.3 (from index 3 to 7) independently from the last sequence of 0.3 (from index 13 to 15).

Using Series.duplicated is insufficient because:

*using keep='first' marks all first instances of duplicates False, but will leave index 13 as True because it is not the first appearance of 0.3.

*Same goes for keep='last'

*keep=False just marks all of the entries as True.

Thank you!

jezrael

I believe need trick with compare shifted values for not equal by ne with cumsum and last drop_duplicates:

s = df['a'].ne(df['a'].shift()).cumsum()
a = s.drop_duplicates().index
b = s.drop_duplicates(keep='last').index

df = pd.DataFrame({'first':a, 'last':b})
print (df)
   first  last
0      0     2
1      3     7
2      8    10
3     11    12
4     13    15

If want also duplicated value to new column a bit change solution with duplicated:

s = df['a'].ne(df['a'].shift()).cumsum()
a = df.loc[~s.duplicated(), 'a']
b = s.drop_duplicates(keep='last')

df = pd.DataFrame({'first':a.index, 'last':b.index, 'val':a})
print (df)
    first  last  val
0       0     2  0.0
3       3     7  0.3
8       8    10  1.0
11     11    12  0.2
13     13    15  0.3

If need new column:

df['count'] = df['a'].ne(df['a'].shift()).cumsum()
print (df)
      a  count
0   0.0      1
1   0.0      1
2   0.0      1
3   0.3      2
4   0.3      2
5   0.3      2
6   0.3      2
7   0.3      2
8   1.0      3
9   1.0      3
10  1.0      3
11  0.2      4
12  0.2      4
13  0.3      5
14  0.3      5
15  0.3      5

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How to drop duplicates in Pandas DataFrame by checking for a condition?

From Dev

How to keep first two duplicates in a pandas dataframe?

From Dev

How to remove rows with duplicates in pandas dataframe?

From Dev

How to Conditionally Remove Duplicates from Pandas DataFrame with a List

From Dev

How to drop duplicates from a subset of rows in a pandas dataframe?

From Dev

Pandas Dataframe: How to drop_duplicates() based on index subset?

From Dev

pandas DataFrame delete contiguous duplicates

From Dev

Extract duplicates into new dataframe with Pandas

From Dev

How do I map duplicates to keys, and map keys to duplicates in a pandas dataframe?

From Dev

Find duplicates in pandas with condition (Python)

From Dev

How to find duplicates in a list?

From Dev

How to remove duplicates in pandas?

From Dev

How to remove duplicates in pandas?

From Dev

How to find ngram frequency of a column in a pandas dataframe?

From Dev

Pandas dataframe: how to find missing years in a timeseries?

From Dev

How to remove duplicates from a dataframe?

From Dev

Drop duplicates in pandas time series dataframe

From Dev

Pandas groupby, drop consecutive duplicates and return as dataframe

From Dev

Drop contradicting duplicates from a pandas dataframe

From Dev

Pivot Pandas Dataframe with Duplicates using Masking

From Dev

python pandas merging two dataframe and dealing with duplicates?

From Dev

How to find duplicates inside a string?

From Dev

How to find duplicates in Active Directory

From Dev

Pandas: Find duplicates and modify them based on date

From Java

How to update a dataframe, from another dataframe with duplicates

From Java

How to find the location index of a given Pandas dataframe index?

From Java

How to find the length of non-exclusive data in Pandas DataFrame

From Java

How to find which columns contain any NaN value in Pandas dataframe

From Dev

Python: how to find values in a column of a pandas dataframe separated by semicolon?

Related Related

  1. 1

    How to drop duplicates in Pandas DataFrame by checking for a condition?

  2. 2

    How to keep first two duplicates in a pandas dataframe?

  3. 3

    How to remove rows with duplicates in pandas dataframe?

  4. 4

    How to Conditionally Remove Duplicates from Pandas DataFrame with a List

  5. 5

    How to drop duplicates from a subset of rows in a pandas dataframe?

  6. 6

    Pandas Dataframe: How to drop_duplicates() based on index subset?

  7. 7

    pandas DataFrame delete contiguous duplicates

  8. 8

    Extract duplicates into new dataframe with Pandas

  9. 9

    How do I map duplicates to keys, and map keys to duplicates in a pandas dataframe?

  10. 10

    Find duplicates in pandas with condition (Python)

  11. 11

    How to find duplicates in a list?

  12. 12

    How to remove duplicates in pandas?

  13. 13

    How to remove duplicates in pandas?

  14. 14

    How to find ngram frequency of a column in a pandas dataframe?

  15. 15

    Pandas dataframe: how to find missing years in a timeseries?

  16. 16

    How to remove duplicates from a dataframe?

  17. 17

    Drop duplicates in pandas time series dataframe

  18. 18

    Pandas groupby, drop consecutive duplicates and return as dataframe

  19. 19

    Drop contradicting duplicates from a pandas dataframe

  20. 20

    Pivot Pandas Dataframe with Duplicates using Masking

  21. 21

    python pandas merging two dataframe and dealing with duplicates?

  22. 22

    How to find duplicates inside a string?

  23. 23

    How to find duplicates in Active Directory

  24. 24

    Pandas: Find duplicates and modify them based on date

  25. 25

    How to update a dataframe, from another dataframe with duplicates

  26. 26

    How to find the location index of a given Pandas dataframe index?

  27. 27

    How to find the length of non-exclusive data in Pandas DataFrame

  28. 28

    How to find which columns contain any NaN value in Pandas dataframe

  29. 29

    Python: how to find values in a column of a pandas dataframe separated by semicolon?

HotTag

Archive