Groupby items within a list in a Pandas DataFrame

Rishabh Srivastava Published at Dev

Rishabh Srivastava

Consider the following DataFrame:

link    tags            views
/a      [tag_a, tag_b]  100
/b      [tag_a, tag_c]  200
/c      [tag_b, tag_c]  150

What would be an efficient way to 'groupby' items within a list in the tags column. For instance, if one were to find the cumulative views for each tag in the DataFrame above, the result would be:

tag     views
tag_a   300
tag_b   250
tag_c   350

So far, this is what I have come up with:

# get all unique tags
all_tags = list(set([item for sublist in df.tags.tolist() for item in sublist]))

# get a count of each tag 
tag_views = {tag: df[df.tags.map(lambda x: tag in x)].views.sum() for tag in all_tags}

This approach is rather slow for a large dataset. Is there a more efficient way (perhaps using the builtin groupby function) of doing this?

AChampion

You could split the tags column into multiple rows and then groupby:

df = pd.DataFrame(...)
tag = pd.DataFrame(df.tags.tolist()).stack()
tag.index = tag.index.droplevel(-1)
tag.name = 'tag'
df.join(tag).groupby('tag').sum()

Result:

       views
tag         
tag_a    300
tag_b    250
tag_c    350

This will not be very space efficient because of the join, especially for a high number of tags per url. For a small number of tags I would be interested to hear about the timings.

Alternatively use a multi-index:

df = pd.DataFrame(...)
all_tags = [...]
groups = df.tags.map(lambda cell: tuple(tag in cell for tag in all_tags))
df.index = pd.MultiIndex.from_tuples(groups.values, names=all_tags)
for t in all_tags:
    print(t, df.xs(True, level=t).views.sum())

Result:

tag_a 300
tag_b 250
tag_c 350

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-25

Comments

0 comments

From Dev

Related Related

Article

Groupby items within a list in a Pandas DataFrame

Groupby items within a list in a Pandas DataFrame

Sorting Pandas dataframe data within Groupby groups

Pandas dataframe groupby make a list or array of a column

How to group dataframe rows into list in pandas groupby

Read list items as string in dataframe pandas

How to convert flat items/list to Pandas dataframe

Filling a pandas dataframe with items from a list

update pandas.DataFrame within a group after .groupby()

Convert from irregular frequency to monthly within groupby objects in pandas dataframe

if condition within groupby pandas

ranks within groupby in pandas

Pandas interpolate within a groupby

Combine dataframe within the list to form a single dataframe using pandas in python

Convert list to dataframe by adding list values to df within for loop pandas

Convert a list of dictionaries within a dataframe to a list of strings - pandas

Select a partial string from a list within a column in Pandas DataFrame

pandas dataframe groupby summation

Reorder pandas groupby dataframe

pandas, dataframe, groupby, std

groupby - python pandas dataframe

Pandas Groupby back to DataFrame

groupby week - pandas dataframe

Pandas Dataframe groupby Display

pandas dataframe groupby summation

Reorder pandas groupby dataframe

Pandas: Look if items are in a list in Series.series in a dataframe

Take long list of items and reshape into dataframe "rows" - pandas python 3

pandas groupby sort within groups

Pandas drop duplicates within groupby

applying pandas cut within a groupby