I'm just getting started with Pandas and am trying to combine: Grouping my data by date, and counting the unique values in each group.
Here's what my data looks like:
User, Type
Datetime
2014-04-15 11:00:00, A, New
2014-04-15 12:00:00, B, Returning
2014-04-15 13:00:00, C, New
2014-04-20 14:00:00, D, New
2014-04-20 15:00:00, B, Returning
2014-04-20 16:00:00, B, Returning
2014-04-20 17:00:00, D, Returning
And here's what I would like to get to: Resample the datetime index to the day (which I can do), and also count the unique users for each day. I'm not interested in the 'Type' column yet.
Day, Unique Users
2014-04-15, 3
2014-04-20, 2
I'm trying df.user.resample('D', how='count').unique
but it doesn't seem to give me the right answer.
You don't need to do a resample to get the desired output in your question. I think you can get by with just a groupby
on date:
print df.groupby(df.index.date)['User'].nunique()
2014-04-15 3
2014-04-20 2
dtype: int64
And then if you want to you could resample to fill in the time series gaps after you count the unique users:
cnt = df.groupby(df.index.date)['User'].nunique()
cnt.index = cnt.index.to_datetime()
print cnt.resample('D')
2014-04-15 3
2014-04-16 NaN
2014-04-17 NaN
2014-04-18 NaN
2014-04-19 NaN
2014-04-20 2
Freq: D, dtype: float64
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments