Pandas: use groupby to count difference between dates

ldevyataykina

I have df:

i,Unnamed,ID,url,used_at,active_seconds,domain,subdomain,search_engine,search_term,diff_time,period
0,322015,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/antoninaribina,2015-12-31 09:16:05,35,vk.com,vk.com,None,None,,1
1,838267,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed,2015-12-31 09:16:38,54,vk.com,vk.com,None,None,33.0,1
2,838271,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos,2015-12-31 09:17:32,34,vk.com,vk.com,None,None,54.0,1
3,322026,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos&z=photo143297356_397216312%2Ffeed1_143297356_1451504298,2015-12-31 09:18:06,4,vk.com,vk.com,None,None,34.0,1
4,838275,0120bc30e78ba5582617a9f3d6dfd8ca,vk.com/feed?section=photos,2015-12-31 09:18:10,4,vk.com,vk.com,None,None,4.0,1
5,322028,7602962fb83ac2e2a0cb44158ca88464,vk.com/feed?section=comments,2015-12-29 09:18:14,8,vk.com,vk.com,None,None,4.0,1
6,322029,7602962fb83ac2e2a0cb44158ca88464,megarand.ru/contest/121070,2015-12-30 09:18:22,16,megarand.ru,megarand.ru,None,None,8.0,1
7,1870917,7602962fb83ac2e2a0cb44158ca88464,vk.com/feed?section=comments,2015-12-31 09:18:38,6,vk.com,vk.com,None,None,16.0,1

I need to print dirrerence between first and last date to every ID. How can I do it? I tried to use df.groupby('ID')['used_at'].diff().dt.seconds but it print difference between every 2 strings

jezrael

I think you need groupby with difference with first and last:

g = df.groupby('ID')['used_at']
print (g.first() - g.last())
ID
0120bc30e78ba5582617a9f3d6dfd8ca   -1 days +23:57:55
7602962fb83ac2e2a0cb44158ca88464   -3 days +23:59:36
Name: used_at, dtype: timedelta64[ns]

Or apply iloc:

print (df.groupby('ID')['used_at'].apply(lambda g: g.iloc[0] - g.iloc[-1]))
ID
0120bc30e78ba5582617a9f3d6dfd8ca   -1 days +23:57:55
7602962fb83ac2e2a0cb44158ca88464   -3 days +23:59:36
Name: used_at, dtype: timedelta64[ns]

With converting timedelta to seconds:

g = df.groupby('ID')['used_at']
print ((g.first() - g.last()).dt.seconds)
ID
0120bc30e78ba5582617a9f3d6dfd8ca    86275
7602962fb83ac2e2a0cb44158ca88464    86376
Name: used_at, dtype: int64

print (df.groupby('ID')['used_at'].apply(lambda g: g.iloc[0] - g.iloc[-1]).dt.seconds)
ID
0120bc30e78ba5582617a9f3d6dfd8ca    86275
7602962fb83ac2e2a0cb44158ca88464    86376
Name: used_at, dtype: int64

Thank you juanpa.arrivillaga for comment:

If datetimes are sorted, you can use:

df.groupby('ID').used_at.min() - df.groupby('ID').used_at.max()

Timings:

In [216]: %timeit (a(df))
The slowest run took 4.30 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.78 ms per loop

In [217]: %timeit (b(df))
1000 loops, best of 3: 1.8 ms per loop

In [218]: %timeit (df.groupby('ID')['used_at'].apply(lambda g: g.iloc[0] - g.iloc[-1]).dt.seconds)
1000 loops, best of 3: 1.53 ms per loop

In [219]: %timeit (df.groupby('ID').agg(['first','last']).apply( lambda r: r['used_at','first'] - r['used_at','last'], axis=1).dt.seconds)
100 loops, best of 3: 14.4 ms per loop

Code for timings:

df = pd.concat([df]*1000).reset_index(drop=True)

def a(df):
    g = df.groupby('ID')['used_at']
    return ((g.first() - g.last()).dt.seconds)

def b(df):
    g = df.groupby('ID')['used_at']
    return ((g.min() - g.max()).dt.seconds)

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Pandas: use groupby to count difference between dates

From Dev

Count Number of Rows Between Two Dates BY ID in a Pandas GroupBy Dataframe

From Dev

Count Number of Rows Between Two Dates BY ID in a Pandas GroupBy Dataframe

From Dev

Difference between dates in Pandas dataframe

From Dev

split, groupby, combine in Pandas to find a difference in dates

From Dev

Count difference between two dates - datetimepicker

From Dev

Count difference between two dates - datetimepicker

From Dev

Python Pandas - Simple Row Count with Dates and Groupby Calculation

From Java

What is the difference between size and count in pandas?

From Dev

pandas dataframe confusing difference between .count and .sum

From Dev

Jquery datepicker count difference between two dates wont work

From Dev

Wrong count of difference days between 2 dates with joda time?

From Dev

Pandas reindex dates in Groupby

From Dev

Do not need the 'days' part for difference between dates field in Python Pandas

From Dev

SQL Count between dates

From Dev

PHP difference between dates

From Dev

Get Difference Between Dates

From Dev

Difference between dates by awk

From Dev

Difference between dates and today

From Dev

Difference between two dates #

From Dev

difference between dict(groupby) and groupby

From Dev

difference between dict(groupby) and groupby

From Dev

Difference between "as_index = False", and "reset_index()" in pandas groupby

From Dev

Why is difference between 2 Dates off by 1 minute if use Date()?

From Dev

Pandas conditional groupby count

From Dev

Pandas - count groupBy results

From Dev

pandas groupby count rate

From Dev

Use groupby in Pandas to count things in one column in comparison to another

From Dev

How can I use pandas groupby.count() for a condition

Related Related

  1. 1

    Pandas: use groupby to count difference between dates

  2. 2

    Count Number of Rows Between Two Dates BY ID in a Pandas GroupBy Dataframe

  3. 3

    Count Number of Rows Between Two Dates BY ID in a Pandas GroupBy Dataframe

  4. 4

    Difference between dates in Pandas dataframe

  5. 5

    split, groupby, combine in Pandas to find a difference in dates

  6. 6

    Count difference between two dates - datetimepicker

  7. 7

    Count difference between two dates - datetimepicker

  8. 8

    Python Pandas - Simple Row Count with Dates and Groupby Calculation

  9. 9

    What is the difference between size and count in pandas?

  10. 10

    pandas dataframe confusing difference between .count and .sum

  11. 11

    Jquery datepicker count difference between two dates wont work

  12. 12

    Wrong count of difference days between 2 dates with joda time?

  13. 13

    Pandas reindex dates in Groupby

  14. 14

    Do not need the 'days' part for difference between dates field in Python Pandas

  15. 15

    SQL Count between dates

  16. 16

    PHP difference between dates

  17. 17

    Get Difference Between Dates

  18. 18

    Difference between dates by awk

  19. 19

    Difference between dates and today

  20. 20

    Difference between two dates #

  21. 21

    difference between dict(groupby) and groupby

  22. 22

    difference between dict(groupby) and groupby

  23. 23

    Difference between "as_index = False", and "reset_index()" in pandas groupby

  24. 24

    Why is difference between 2 Dates off by 1 minute if use Date()?

  25. 25

    Pandas conditional groupby count

  26. 26

    Pandas - count groupBy results

  27. 27

    pandas groupby count rate

  28. 28

    Use groupby in Pandas to count things in one column in comparison to another

  29. 29

    How can I use pandas groupby.count() for a condition

HotTag

Archive