My data in principle looks like this:
one two
timestamp
2013-12-06 00:00:01.200000 1 1
2013-12-06 00:00:02.200000 1 2
2013-12-06 00:00:03.200000 2 1
2013-12-06 00:00:04.200000 3 5
2013-12-06 00:00:05.200000 1 2
I would like to group it over column 'one' and take the first timestamp of each group. Applying this to column 'two' works just fine but it does not work for the timestamp.
df_2 = df['two'].groupby(df['one']).first()
gives:
one
1 1
2 1
3 5
but it tells me there is no attribute 'first' when I apply the same thing to the index.
df_3 = df.index.groupby(df['one']).first()
Does anyone know how this can be done?
You could use groupby/apply
:
>>> grouped = df.groupby('one')
>>> grouped.apply(lambda x: x.index[0])
one
1 2013-12-06 00:00:01.200000
2 2013-12-06 00:00:03.200000
3 2013-12-06 00:00:04.200000
dtype: datetime64[ns]
By the way,
df_2 = df['two'].groupby(df['one']).first()
can also be expressed as
>>> grouped['two'].first()
one
1 1
2 1
3 5
Name: two, dtype: int64
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments