Groupby items within a list in a Pandas DataFrame

Rishabh Srivastava

Consider the following DataFrame:

link    tags            views
/a      [tag_a, tag_b]  100
/b      [tag_a, tag_c]  200
/c      [tag_b, tag_c]  150

What would be an efficient way to 'groupby' items within a list in the tags column. For instance, if one were to find the cumulative views for each tag in the DataFrame above, the result would be:

tag     views
tag_a   300
tag_b   250
tag_c   350

So far, this is what I have come up with:

# get all unique tags
all_tags = list(set([item for sublist in df.tags.tolist() for item in sublist]))

# get a count of each tag 
tag_views = {tag: df[df.tags.map(lambda x: tag in x)].views.sum() for tag in all_tags}

This approach is rather slow for a large dataset. Is there a more efficient way (perhaps using the builtin groupby function) of doing this?

AChampion

You could split the tags column into multiple rows and then groupby:

df = pd.DataFrame(...)
tag = pd.DataFrame(df.tags.tolist()).stack()
tag.index = tag.index.droplevel(-1)
tag.name = 'tag'
df.join(tag).groupby('tag').sum()

Result:

       views
tag         
tag_a    300
tag_b    250
tag_c    350

This will not be very space efficient because of the join, especially for a high number of tags per url. For a small number of tags I would be interested to hear about the timings.

Alternatively use a multi-index:

df = pd.DataFrame(...)
all_tags = [...]
groups = df.tags.map(lambda cell: tuple(tag in cell for tag in all_tags))
df.index = pd.MultiIndex.from_tuples(groups.values, names=all_tags)
for t in all_tags:
    print(t, df.xs(True, level=t).views.sum())

Result:

tag_a 300
tag_b 250
tag_c 350

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Sorting Pandas dataframe data within Groupby groups

From Java

Pandas dataframe groupby make a list or array of a column

From Java

How to group dataframe rows into list in pandas groupby

From Dev

Read list items as string in dataframe pandas

From Dev

How to convert flat items/list to Pandas dataframe

From Dev

Filling a pandas dataframe with items from a list

From Dev

update pandas.DataFrame within a group after .groupby()

From Dev

Convert from irregular frequency to monthly within groupby objects in pandas dataframe

From Java

if condition within groupby pandas

From Dev

ranks within groupby in pandas

From Dev

Pandas interpolate within a groupby

From Dev

Combine dataframe within the list to form a single dataframe using pandas in python

From Java

Convert list to dataframe by adding list values to df within for loop pandas

From Dev

Convert a list of dictionaries within a dataframe to a list of strings - pandas

From Java

Select a partial string from a list within a column in Pandas DataFrame

From Dev

pandas dataframe groupby summation

From Dev

Reorder pandas groupby dataframe

From Dev

pandas, dataframe, groupby, std

From Dev

groupby - python pandas dataframe

From Dev

Pandas Groupby back to DataFrame

From Dev

groupby week - pandas dataframe

From Dev

Pandas Dataframe groupby Display

From Dev

pandas dataframe groupby summation

From Dev

Reorder pandas groupby dataframe

From Dev

Pandas: Look if items are in a list in Series.series in a dataframe

From Dev

Take long list of items and reshape into dataframe "rows" - pandas python 3

From Java

pandas groupby sort within groups

From Dev

Pandas drop duplicates within groupby

From Dev

applying pandas cut within a groupby

Related Related

  1. 1

    Sorting Pandas dataframe data within Groupby groups

  2. 2

    Pandas dataframe groupby make a list or array of a column

  3. 3

    How to group dataframe rows into list in pandas groupby

  4. 4

    Read list items as string in dataframe pandas

  5. 5

    How to convert flat items/list to Pandas dataframe

  6. 6

    Filling a pandas dataframe with items from a list

  7. 7

    update pandas.DataFrame within a group after .groupby()

  8. 8

    Convert from irregular frequency to monthly within groupby objects in pandas dataframe

  9. 9

    if condition within groupby pandas

  10. 10

    ranks within groupby in pandas

  11. 11

    Pandas interpolate within a groupby

  12. 12

    Combine dataframe within the list to form a single dataframe using pandas in python

  13. 13

    Convert list to dataframe by adding list values to df within for loop pandas

  14. 14

    Convert a list of dictionaries within a dataframe to a list of strings - pandas

  15. 15

    Select a partial string from a list within a column in Pandas DataFrame

  16. 16

    pandas dataframe groupby summation

  17. 17

    Reorder pandas groupby dataframe

  18. 18

    pandas, dataframe, groupby, std

  19. 19

    groupby - python pandas dataframe

  20. 20

    Pandas Groupby back to DataFrame

  21. 21

    groupby week - pandas dataframe

  22. 22

    Pandas Dataframe groupby Display

  23. 23

    pandas dataframe groupby summation

  24. 24

    Reorder pandas groupby dataframe

  25. 25

    Pandas: Look if items are in a list in Series.series in a dataframe

  26. 26

    Take long list of items and reshape into dataframe "rows" - pandas python 3

  27. 27

    pandas groupby sort within groups

  28. 28

    Pandas drop duplicates within groupby

  29. 29

    applying pandas cut within a groupby

HotTag

Archive