Pandas: create category column based on multiple columns

user987055

Which would be the most efficient way to create a category column based on other columns in the row, as quickly as possible?

input:

   col1  col2  col3  col4
0     0     0   -10     1
1     1   100     0    -1
2     0     0     0     1
3     0     0   -10     1
4     1   100     0    -1

output:

   col1  col2  col3  col4 new_col
0     0     0   -10     1       1
1     1   100     0    -1       2
2     0     0     0     1       3
3     0     0   -10     1       1
4     1   100     0    -1       2
Stef

The fastest method is probably using numpy unique (if all columns are numeric):

_, new_col = np.unique(df.to_numpy(), axis=0, return_inverse=True)
df['new_col'] = new_col

or as one-liner:

df['new_col'] = np.unique(df.to_numpy(), axis=0, return_inverse=True)[1]

   col1  col2  col3  col4  new_col
0     0     0   -10     1        0
1     1   100     0    -1        2
2     0     0     0     1        1
3     0     0   -10     1        0
4     1   100     0    -1        2

This is about 10 times faster (for the sample data) than groupby on all columns and using the group number ngroup as category code:

df['new_col'] = df.groupby(df.columns.to_list()).ngroup()

The advantage of this method is that it also works for mixed or non-numeric typed dataframes.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Java

pandas create new column based on values from other columns / apply a function of multiple columns, row-wise

From Java

Deleting multiple columns based on column names in Pandas

From Dev

How to create sum of columns in Pandas based on a conditional of multiple columns?

From Dev

Create a column based on multiple columns- in R

From Dev

Conditional replacement of multiple columns based on column values in pandas DataFrame

From Dev

How to create a new column based on time difference of two columns in Pandas?

From Dev

Pandas: subset multiple columns by name based on value in another column

From Dev

Create multiple columns in R based on other column

From Dev

create multiple columns from 1 column pandas

From Dev

How to group by multiple columns and create a new column based on conditions in Python?

From Dev

How to groupby multiple columns and create a new column in Python based on thresholds

From Dev

How to groupby multiple columns and create a new column in Python based on thresholds

From Dev

How to groupby multiple columns and create a new column in Python based on thresholds

From Dev

How to groupby multiple columns and create a new column in Python based on thresholds

From Dev

How to groupby multiple columns and create a new column in Python based on thresholds

From Dev

Pandas :How to split the tuple data in column and create multiple columns

From Dev

Sorting multiple Pandas Dataframe Columns based on the sorting of one column

From Dev

create new column based on other columns in pandas dataframe

From Dev

Create multiple columns in R based on other column

From Dev

Python pandas: create new column based on category values from another dataframe

From Dev

Create column values based on multiple conditions in other columns

From Dev

Pandas - Create a column based on values from 2 other columns

From Dev

pandas: create column with string value based on conditions in other columns

From Dev

pandas dataframe column based on row and multiple columns

From Dev

Create a new column in pandas based on values in multiple columns and the same condition

From Dev

Create column in pandas based on two other columns and table

From Dev

Pandas: sum multiple columns based on similar consecutive numbers in another column

From Dev

Pandas - Create Separate Columns in DataFrame Based on a Specific Column's Values

From Dev

Create a new column based on multiple columns

Related Related

  1. 1

    pandas create new column based on values from other columns / apply a function of multiple columns, row-wise

  2. 2

    Deleting multiple columns based on column names in Pandas

  3. 3

    How to create sum of columns in Pandas based on a conditional of multiple columns?

  4. 4

    Create a column based on multiple columns- in R

  5. 5

    Conditional replacement of multiple columns based on column values in pandas DataFrame

  6. 6

    How to create a new column based on time difference of two columns in Pandas?

  7. 7

    Pandas: subset multiple columns by name based on value in another column

  8. 8

    Create multiple columns in R based on other column

  9. 9

    create multiple columns from 1 column pandas

  10. 10

    How to group by multiple columns and create a new column based on conditions in Python?

  11. 11

    How to groupby multiple columns and create a new column in Python based on thresholds

  12. 12

    How to groupby multiple columns and create a new column in Python based on thresholds

  13. 13

    How to groupby multiple columns and create a new column in Python based on thresholds

  14. 14

    How to groupby multiple columns and create a new column in Python based on thresholds

  15. 15

    How to groupby multiple columns and create a new column in Python based on thresholds

  16. 16

    Pandas :How to split the tuple data in column and create multiple columns

  17. 17

    Sorting multiple Pandas Dataframe Columns based on the sorting of one column

  18. 18

    create new column based on other columns in pandas dataframe

  19. 19

    Create multiple columns in R based on other column

  20. 20

    Python pandas: create new column based on category values from another dataframe

  21. 21

    Create column values based on multiple conditions in other columns

  22. 22

    Pandas - Create a column based on values from 2 other columns

  23. 23

    pandas: create column with string value based on conditions in other columns

  24. 24

    pandas dataframe column based on row and multiple columns

  25. 25

    Create a new column in pandas based on values in multiple columns and the same condition

  26. 26

    Create column in pandas based on two other columns and table

  27. 27

    Pandas: sum multiple columns based on similar consecutive numbers in another column

  28. 28

    Pandas - Create Separate Columns in DataFrame Based on a Specific Column's Values

  29. 29

    Create a new column based on multiple columns

HotTag

Archive