In a dataframe group rows containing a list over one column

Jeremy Hunts

I have the following dataframe (df) (All columns contain lists, except type, contains strings)

Type    Components        names
Zebra  [hand,arm,nose]   [bubu,kuku]
Zebra   [eyes,fingers]   [gaga,timber]
Zebra   [paws]           []
Lion    [teeth]          [scar]
Tiger   [fingers]        [figgy]

I want to group them based on Type so the output is like this:

Type    Components                           Names
Zebra   [hand,arm,nose,eyes,fingers,paws]    [bubu,kuku,gaga,timber]
Lion    [teeth]                              [scar]
Tiger   [fingers]                            [figgy]

I tried things like:

df.groupby('role')

I wasn't successful with using .agg in the end also.

cs95

Option 1
groupby + sum
Not optimised, does not account for duplicates

df.groupby('Type', sort=False, as_index=False).sum()

    Type                              Components                       names
0  Zebra  [hand, arm, nose, eyes, fingers, paws]  [bubu, kuku, gaga, timber]
1   Lion                                 [teeth]                      [scar]
2  Tiger                               [fingers]                     [figgy]

Option 2
groupby + agg + itertools.chain
Accounts for duplicate, and very efficient with flattening

from itertools import chain
df.groupby('Type', sort=False, as_index=False).agg(
    lambda x: list(set(chain.from_iterable(x)))
)

    Type                              Components                       names
0  Zebra  [eyes, hand, paws, arm, fingers, nose]  [timber, bubu, gaga, kuku]
1   Lion                                 [teeth]                      [scar]
2  Tiger                               [fingers]                     [figgy]

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Java

Efficient way to group pandas dataframe rows by a list of tags in a column

From Dev

deleting rows in R containing one blank column

From Dev

Build a new DataFrame from an existing one with a column containing a list (populate new lines using a list)

From Dev

List Comprehension Over Pandas Dataframe Rows

From Dev

Group several rows of data into one row by column?

From Java

How to group dataframe rows into list in pandas groupby

From Dev

Merge Pandas DataFrame rows into a string in one column

From Dev

Divide rows of a dataframe conditional on the value of one column

From Dev

Converting each cell in a dataframe column, containing a list, into a row in the dataframe

From Java

Function to replicate rows of dataframe if column contains list

From Dev

Group rows in dataframe by assigning values as a column in pandas dataframe

From Dev

Split a column containing a list into multiple rows in Pandas based on a condition

From Dev

Pandas: create multiple rows in a df for an exploded column containing list values

From Dev

SQL - COUNT() rows for two tables that are GROUP BY one column in one table

From Dev

How to group by for one column within a list of dataframes

From Dev

Pandas dataframe - identify rows with value over threshold in any column

From Dev

Create a new column by applying a Reduce function over rows of a dataframe in R

From Dev

How to count rows efficiently with one pass over the dataframe

From Dev

Pandas dataframe: Group by two columns and then average over another column

From Dev

Adding column to pandas DataFrame containing list of other columns' values

From Dev

match value from one list to dataframe column

From Java

How to one-hot-encode from a pandas column containing a list?

From Dev

Excel - Retrieve list of column names with cells containing >0 in one row

From Dev

Looping over rows in a dataframe

From Dev

R Dataframe: aggregating strings within column, across rows, by group

From Dev

Rolling over values from one column to other based on another dataframe

From Dev

LINQ - Group by one column and Count() rows (extension method)

From Dev

Index DataFrame with MultiIndex Rows and Columns via another DataFrame containing row and column indices as columns

From Dev

How summing DataFrame column values over chunks defined by a list?

Related Related

  1. 1

    Efficient way to group pandas dataframe rows by a list of tags in a column

  2. 2

    deleting rows in R containing one blank column

  3. 3

    Build a new DataFrame from an existing one with a column containing a list (populate new lines using a list)

  4. 4

    List Comprehension Over Pandas Dataframe Rows

  5. 5

    Group several rows of data into one row by column?

  6. 6

    How to group dataframe rows into list in pandas groupby

  7. 7

    Merge Pandas DataFrame rows into a string in one column

  8. 8

    Divide rows of a dataframe conditional on the value of one column

  9. 9

    Converting each cell in a dataframe column, containing a list, into a row in the dataframe

  10. 10

    Function to replicate rows of dataframe if column contains list

  11. 11

    Group rows in dataframe by assigning values as a column in pandas dataframe

  12. 12

    Split a column containing a list into multiple rows in Pandas based on a condition

  13. 13

    Pandas: create multiple rows in a df for an exploded column containing list values

  14. 14

    SQL - COUNT() rows for two tables that are GROUP BY one column in one table

  15. 15

    How to group by for one column within a list of dataframes

  16. 16

    Pandas dataframe - identify rows with value over threshold in any column

  17. 17

    Create a new column by applying a Reduce function over rows of a dataframe in R

  18. 18

    How to count rows efficiently with one pass over the dataframe

  19. 19

    Pandas dataframe: Group by two columns and then average over another column

  20. 20

    Adding column to pandas DataFrame containing list of other columns' values

  21. 21

    match value from one list to dataframe column

  22. 22

    How to one-hot-encode from a pandas column containing a list?

  23. 23

    Excel - Retrieve list of column names with cells containing >0 in one row

  24. 24

    Looping over rows in a dataframe

  25. 25

    R Dataframe: aggregating strings within column, across rows, by group

  26. 26

    Rolling over values from one column to other based on another dataframe

  27. 27

    LINQ - Group by one column and Count() rows (extension method)

  28. 28

    Index DataFrame with MultiIndex Rows and Columns via another DataFrame containing row and column indices as columns

  29. 29

    How summing DataFrame column values over chunks defined by a list?

HotTag

Archive