This is my dataframe:
> df
a b
0 1 set([2, 3])
1 2 set([2, 3])
2 3 set([4, 5, 6])
3 1 set([1, 34, 3, 2])
Now when I groupby
, I want to update sets. If it was a list
there was no problem. But the output of my command is:
> df.groupby('a').sum()
a b
1 NaN
2 set([2, 3])
3 set([4, 5, 6])
What should I do in groupby to update sets? The output I'm looking for is as below:
a b
1 set([2, 3, 1, 34])
2 set([2, 3])
3 set([4, 5, 6])
This might be close to what you want
df.groupby('a').apply(lambda x: set.union(*x.b))
In this case it takes the union of the sets.
If you need to keep the column names you could use:
df.groupby('a').agg({'b':lambda x: set.union(*x)}).reset_index('a')
Result:
a b
0 1 set([1, 2, 3, 34])
1 2 set([2, 3])
2 3 set([4, 5, 6])
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments