pandas sort within group then aggregation

Travis

I am doing query analysis of search engine. User may search different query one by one on google search engine at different time in one session.

I have data with several field: session_id, log_time, query, feature_i, etc. I want to group by session_id and then concat several rows into one by the order of log_time. So that output data will represent user's behaviors in a time series way.

dataset

Code:

toy_data = pd.DataFrame({'session_id':[1,2,1,2,3,3,],
             'log_time':[4,5,6,1,2,3],
             'query':['hi','dude','pandas','groupby','sort','agg'],
             'cate_feat_0':['apple','banana']*3,
             'num_feat_0':[1,2,3,4,5,6]})
print(toy_data)

Output:

       session_id  log_time query cate_feat_0  num_feat_0
0           1         4       hi       apple           1
1           2         5     dude      banana           2
2           1         6   pandas       apple           3
3           2         1  groupby      banana           4
4           3         2     sort       apple           5
5           3         3      agg      banana           6

What I want:

## note that all list are sorted by log time with each session_id group
session_id    query_list    log_time_list cate_feat_0_list    num_feat_0_list
    1         [hi, pandas]   [4,6]        [apple, apple]      [1,3]
    2         [groupby, dude] [1,5]       [banana, banana]    [4,2]  
    3         [sort,agg]      [2,3]       [apple, banana]     [5,6]

My attempt

First we groupby and agg with code:

toy_data_res = toy_data.groupby('session_id').agg({'query':list, 'log_time':list, 'cate_feat_0':list, 'num_feat_0':list})
toy_data_res

Gives:

                      query log_time       cate_feat_0 num_feat_0
session_id                                                       
1              [hi, pandas]   [4, 6]    [apple, apple]     [1, 3]
2           [dude, groupby]   [5, 1]  [banana, banana]     [2, 4]
3               [sort, agg]   [2, 3]   [apple, banana]     [5, 6]

Then we sort with in each session with code:

for i in toy_data_res.index:
    sort_index = np.argsort(toy_data_res.loc[i,'log_time']) ##  get time order with in group
    for col in toy_data_res.columns.values:
        toy_data_res.loc[i,col] = [toy_data_res.loc[i,col][j] for j in sort_index] ## sort values in cols 
toy_data_res

Gives:

                      query log_time       cate_feat_0 num_feat_0
session_id                                                       
1              [hi, pandas]   [4, 6]    [apple, apple]     [1, 3]
2           [groupby, dude]   [1, 5]  [banana, banana]     [4, 2]
3               [sort, agg]   [2, 3]   [apple, banana]     [5, 6]

My approach is quick slow. Is there any better way to do groupby -> sort with in group -> aggregation?

Tips: We can use STRING_AGG or GROUP_CONCAT in SQL to do within group sorting.

jezrael

Use DataFrame.sort_values before groupby, if need apply same function is possible use list of columns names:

df = (toy_data.sort_values(['session_id','log_time'])
              .groupby('session_id')[['query','log_time','cate_feat_0', 'num_feat_0']]
              .agg(list))

    
print (df)
                      query log_time       cate_feat_0 num_feat_0
session_id                                                       
1              [hi, pandas]   [4, 6]    [apple, apple]     [1, 3]
2           [groupby, dude]   [1, 5]  [banana, banana]     [4, 2]
3               [sort, agg]   [2, 3]   [apple, banana]     [5, 6]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

mongodb aggregation $group the $sort

From Dev

Pandas customized group aggregation

From Dev

Sort within group with linq

From Dev

Resampling aggregation per group in Pandas

From Dev

Multiple aggregation in group by in Pandas Dataframe

From Dev

How to sort the $addToSet within a $group?

From Dev

Sort and Group in one MongoDB aggregation query

From Dev

limit and sort each group by in mongoDB using aggregation

From Java

pandas groupby sort within groups

From Dev

Removing group header after pandas aggregation

From Dev

Calculation within Pandas dataframe group

From Dev

Mongodb aggregate sort and limit within group

From Dev

Pandas group and sort by index count

From Dev

dplyr arrange - sort groups by another column and then sort within each group

From Dev

Group by date range , count and sort within each group LINQ

From Dev

Sort by column within multi index level in pandas

From Dev

Pandas: sort within groupby on a particular column

From Dev

Sequential within-group enumeration in Pandas

From Dev

How to add "order within group" column in pandas?

From Dev

Pandas number rows within group in increasing order

From Java

Python Pandas: Calculate moving average within group

From Dev

Check if values within a group are equal in Pandas

From Dev

add timedelta data within a group in pandas dataframe

From Dev

Row operations within a group of a pandas dataframe

From Dev

Python Pandas: Calculate moving average within group

From Dev

Time difference within group by objects in Python Pandas

From Dev

Pandas: Count consecective True values within group

From Dev

resample data within each group in pandas

From Dev

Dynamic sub-groupping within a group in pandas

Related Related

HotTag

Archive