Remove Quasi Duplicates In Pandas

Jeff Saltfist

I have a Pandas data frame that looks as follows:

import pandas as pd
data = pd.read_csv('C:\Users\Frank\Desktop\\10-25-16-54-7-IMPORT.csv', index_col=False)
print data.head(10)

   Date                                 Symbol   
0  2015-03-18 01:54:35 UTC              NKTR             -0.290   
1  2015-03-18 02:10:49 UTC               DRQ             -0.082   
2  2015-03-18 03:03:10 UTC              NKTR             -0.290   
3  2015-03-18 03:13:17 UTC               UAM              0.414   
4  2015-03-18 03:48:24 UTC              ROCK              0.000   
5  2015-03-18 03:56:30 UTC              ROCK              0.000   
6  2015-03-18 04:52:24 UTC               MTZ             -0.290   
7  2015-03-18 05:00:29 UTC              NKTR             -0.290   
8  2015-03-18 05:04:31 UTC              NKTR             -0.290   
9  2015-03-18 05:29:48 UTC              PSEC             -0.046 

I want to remove every row with a duplicate symbol (in this case "NKTR") that occurs subsequent to the first instance of that same symbol on that same day. Is this possible?

(removing duplicates will not work because of the different time stamp of the rows).

Psidom

You can try groupby() the date of the Date column and Symbol then take the first row of each group:

import pandas as pd
df.groupby([pd.to_datetime(df.Date).dt.date, 'Symbol'], as_index=False).first()

#  Symbol                      Date  Value
#0    DRQ   2015-03-18 02:10:49 UTC -0.082
#1    MTZ   2015-03-18 04:52:24 UTC -0.290
#2   NKTR   2015-03-18 01:54:35 UTC -0.290
#3   PSEC   2015-03-18 05:29:48 UTC -0.046
#4   ROCK   2015-03-18 03:48:24 UTC  0.000
#5    UAM   2015-03-18 03:13:17 UTC  0.414

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How to remove duplicates in pandas?

From Dev

How to remove duplicates in pandas?

From Dev

Removing quasi-duplicates from an R Dataframe

From Dev

How to remove rows with duplicates in pandas dataframe?

From Dev

Combine rows to remove duplicates in CSV Python and Pandas

From Dev

Remove duplicates

From Dev

Pandas / Python remove duplicates based on specific row values

From Dev

How to Conditionally Remove Duplicates from Pandas DataFrame with a List

From Dev

Remove duplicates in dataframe pandas based on values of two columns

From Dev

Remove duplicates in pandas. copy() and drop_duplicates() is removing rows that appear only once

From Dev

Python/Pandas - remove rows based on conditions below in a dataframe (similar to remove duplicates but not the same)

From Dev

"Remove Duplicates" feature does not remove all duplicates

From Dev

Remove duplicates based on date

From Dev

Cannot remove duplicates properly

From Java

Remove duplicates form an array

From Dev

Remove duplicates (same lines)

From Dev

Remove Duplicates From BindingList

From Dev

Generalization of "remove all duplicates"

From Dev

Remove Duplicates, but Honor Case

From Dev

How to remove duplicates in ListBox?

From Dev

SQL Server / Remove duplicates

From Dev

How to remove duplicates

From Dev

Remove duplicates from column

From Dev

Remove duplicates in string algorithm

From Dev

Remove duplicates of array

From Dev

Remove Duplicates and SUM

From Dev

Remove duplicates with old dates

From Dev

Using Dataflows to Remove Duplicates

From Dev

Remove Duplicates from MongoDB