I have a Pandas data frame that looks as follows:
import pandas as pd
data = pd.read_csv('C:\Users\Frank\Desktop\\10-25-16-54-7-IMPORT.csv', index_col=False)
print data.head(10)
Date Symbol
0 2015-03-18 01:54:35 UTC NKTR -0.290
1 2015-03-18 02:10:49 UTC DRQ -0.082
2 2015-03-18 03:03:10 UTC NKTR -0.290
3 2015-03-18 03:13:17 UTC UAM 0.414
4 2015-03-18 03:48:24 UTC ROCK 0.000
5 2015-03-18 03:56:30 UTC ROCK 0.000
6 2015-03-18 04:52:24 UTC MTZ -0.290
7 2015-03-18 05:00:29 UTC NKTR -0.290
8 2015-03-18 05:04:31 UTC NKTR -0.290
9 2015-03-18 05:29:48 UTC PSEC -0.046
I want to remove every row with a duplicate symbol (in this case "NKTR") that occurs subsequent to the first instance of that same symbol on that same day. Is this possible?
(removing duplicates will not work because of the different time stamp of the rows).
You can try groupby()
the date of the Date
column and Symbol
then take the first row of each group:
import pandas as pd
df.groupby([pd.to_datetime(df.Date).dt.date, 'Symbol'], as_index=False).first()
# Symbol Date Value
#0 DRQ 2015-03-18 02:10:49 UTC -0.082
#1 MTZ 2015-03-18 04:52:24 UTC -0.290
#2 NKTR 2015-03-18 01:54:35 UTC -0.290
#3 PSEC 2015-03-18 05:29:48 UTC -0.046
#4 ROCK 2015-03-18 03:48:24 UTC 0.000
#5 UAM 2015-03-18 03:13:17 UTC 0.414
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments