one year rolling count of unique values by group in pandas

Juan C Published at Dev

Juan C

So I have the following dataframe:

Period      group     ID    
20130101     A        10
20130101     A        20
20130301     A        20
20140101     A        20
20140301     A        30
20140401     A        40

20130101     B        11
20130201     B        21
20130401     B        31
20140401     B        41
20140501     B        51

I need to count how many different ID there are by group in the last year. So my desired output would look like this:

Period      group     num_ids_last_year
20130101     A            2 # ID 10 and 20 in the last year
20130301     A            2 
20140101     A            2 
20140301     A            2 # ID 30 enters, ID 10 leaves
20140401     A            3 # ID 40 enters

20130101     B            1
20130201     B            2
20130401     B            3
20140401     B            2 # ID 11 and 21 leave 
20140501     B            2 # ID 31 leaves, ID 51 enters

Period is in datetime format. I tried many things along the lines of:

df.groupby(['group','Period'])['ID'].nunique() # Get number of IDs by group in a given period.
df.groupby(['group'])['ID'].nunique() # Get total number of IDs by group.

df.set_index('Period').groupby('group')['ID'].rolling(window=1, freq='Y').nunique()

But the last one isn't even possible. Is there any straightforward way to do this? I'm thinking maybe some kind of combination of cumcount() and pd.DateOffset or maybe ge(df.Period - dt.timedelta(365), but I can't find the answer.

Thanks.

Edit: added the fact that I can find more than one ID in a given Period

Steven G

looking at your data structure, I am guessing you have MANY duplicates, so start with dropping them. drop_duplicates tend to be fast

I am assuming that df['Period'] columns is of dtype datetime64[ns]

df = df.drop_duplicates()
results = dict()
for start in df['Period'].drop_duplicates():
    end = start.date() - relativedelta(years=1)
    screen = (df.Period <= start) & (df.Period >= end)  # screen for 1 year of data
    singles = df.loc[screen, ['group', 'ID']].drop_duplicates()  # screen for same year ID by groups
    x = singles.groupby('group').count()
    results[start] = x
results = pd.concat(results, 0)


results
                  ID
           group    
2013-01-01 A       2
           B       1
2013-02-01 A       2
           B       2
2013-03-01 A       2
           B       2
2013-04-01 A       2
           B       3
2014-01-01 A       2
           B       3
2014-03-01 A       2
           B       1
2014-04-01 A       3
           B       2
2014-05-01 A       3
           B       2

is that any faster?

p.s. if df['Period'] is not a datetime:

df['Period'] = pd.to_datetime(df['Period'],format='%Y%m%d', errors='ignore')

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-12-12

Comments

0 comments

From Dev

Rolling unique value count in pandas

From Dev

Pandas: How to group by one column and show count for unique values for all other columns per group?

From Dev

How to count unique values in one colulmn based on value in another column by group in Pandas

From Dev

Count unique values by group

From Dev

Count of unique values in a rolling date range for R

From Dev

How to count the combinations of unique values per group in pandas?

From Dev

Count of unique values per group as new column with pandas

From Dev

Pandas: syslog - group by location - count unique values by day

From Dev

Count the number of unique values by group

From Dev

Pandas - A rolling cumulative count of distinct values

From Dev

Cumulative count of unique values in pandas

From Dev

How to do group by and take Count of one column divide by count of unique of second column of data frame in python pandas?

From Dev

Sum unique values by group with pandas

From Dev

Pandas Data frame Finding data related to rolling one year

From Dev

How to efficiently compute a rolling unique count in a pandas time series?

From Dev

Append unique group ids from pandas rolling window

From Dev

To merge multiple columns into one column and count the repetition of unique values and maintain a separate column for each count in pandas dataframe

From Dev

Pandas group by column and count values

From Dev

How to count values in pandas group

From Java

Count of values grouped per month, year - Pandas

From Dev

Count values across columns and then groupby year pandas

From Dev

SQL group by count unique values into separate columns

From Dev

How to count the number of unique values in a group of rows?

From Dev

How to count the number of unique values by group?

From Dev

MongoDB - Count unique values with group by date

From Dev

R Help: Count Unique Values by Group

From Dev

Calculate 3 Month Rolling Count of values in a column Pandas

From Dev

Pandas DataFrame rolling count

From Dev

Pandas Conditional Rolling Count

Related Related

Article