I have the following dataframe (df) (All columns contain lists, except type, contains strings)
Type Components names
Zebra [hand,arm,nose] [bubu,kuku]
Zebra [eyes,fingers] [gaga,timber]
Zebra [paws] []
Lion [teeth] [scar]
Tiger [fingers] [figgy]
I want to group them based on Type so the output is like this:
Type Components Names
Zebra [hand,arm,nose,eyes,fingers,paws] [bubu,kuku,gaga,timber]
Lion [teeth] [scar]
Tiger [fingers] [figgy]
I tried things like:
df.groupby('role')
I wasn't successful with using .agg in the end also.
Option 1
groupby
+ sum
Not optimised, does not account for duplicates
df.groupby('Type', sort=False, as_index=False).sum()
Type Components names
0 Zebra [hand, arm, nose, eyes, fingers, paws] [bubu, kuku, gaga, timber]
1 Lion [teeth] [scar]
2 Tiger [fingers] [figgy]
Option 2
groupby
+ agg
+ itertools.chain
Accounts for duplicate, and very efficient with flattening
from itertools import chain
df.groupby('Type', sort=False, as_index=False).agg(
lambda x: list(set(chain.from_iterable(x)))
)
Type Components names
0 Zebra [eyes, hand, paws, arm, fingers, nose] [timber, bubu, gaga, kuku]
1 Lion [teeth] [scar]
2 Tiger [fingers] [figgy]
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments