Efficient/Pythonic way to Filter pandas DataFrame based on priority

Tommy

I have below dataframe.

+-----------+----------+-----+
| InvoiceNo | ItemCode | Qty |
+-----------+----------+-----+
|  Inv-001  |     A    |  2  |
+-----------+----------+-----+
|  Inv-001  |     B    |  3  |
+-----------+----------+-----+
|  Inv-001  |     C    |  1  |
+-----------+----------+-----+
|  Inv-002  |     B    |  3  |
+-----------+----------+-----+
|  Inv-002  |     D    |  4  |
+-----------+----------+-----+
|  Inv-003  |     C    |  3  |
+-----------+----------+-----+
|  Inv-003  |     D    |  9  |
+-----------+----------+-----+
|  Inv-004  |     D    |  5  |
+-----------+----------+-----+
|  Inv-004  |     E    |  8  |
+-----------+----------+-----+
|  Inv-005  |     X    |  2  |
+-----------+----------+-----+

my task is to create an additional column Type based on the priority of the item occurrence.

eg: ItemCode A has 1st Priority. then B has 2nd priority and C has 3rd priority. rest of the items has least priority and classified has Other.

So, if any Invoice contains item A, the type should be Type - A irrespective other items presence. from the balance Invoices if item B contains, then the type should be Type - B. same for C. if none of A, B or C is not present in any invoice, then the type should be Type - Other.

Below is my desired output.

+-----------+----------+-----+--------------+
| InvoiceNo | ItemCode | Qty |     Type     |
+-----------+----------+-----+--------------+
|  Inv-001  |     A    |  2  |   Type - A   |
+-----------+----------+-----+--------------+
|  Inv-001  |     B    |  3  |   Type - A   |
+-----------+----------+-----+--------------+
|  Inv-001  |     C    |  1  |   Type - A   |
+-----------+----------+-----+--------------+
|  Inv-002  |     B    |  3  |   Type - B   |
+-----------+----------+-----+--------------+
|  Inv-002  |     D    |  4  |   Type - B   |
+-----------+----------+-----+--------------+
|  Inv-003  |     C    |  3  |   Type - C   |
+-----------+----------+-----+--------------+
|  Inv-003  |     D    |  9  |   Type - C   |
+-----------+----------+-----+--------------+
|  Inv-004  |     D    |  5  | Type - Other |
+-----------+----------+-----+--------------+
|  Inv-004  |     E    |  8  | Type - Other |
+-----------+----------+-----+--------------+
|  Inv-005  |     X    |  2  | Type - Other |
+-----------+----------+-----+--------------+

Below is my code and it works. But, it is more cumbersome and not pythonic at all.

# load Dataframe
df = pd.read_excel() 

# filter data containing `A`
mask_A = (df['ItemCode'] == 'A').groupby(df['InvoiceNo']).transform('any')
df_A = df[mask_A]
df_A['Type'] = 'Type - A'

# form the rest of the data, filter data containing `B`
df = df[~mask_A]
mask_B = (df['ItemCode'] == 'B').groupby(df['InvoiceNo']).transform('any')
df_B = df[mask_B]
df_B['Type'] = 'Type - B'

# form the rest of the data, filter data containing `c`
df = df[~mask_B]
mask_C = (df['ItemCode'] == 'C').groupby(df['InvoiceNo']).transform('any')
df_C = df[mask_C]
df_C['Type'] = 'Type - C'

# form the rest of the data, filter data doesnt contain `A, B or C`
df_Other = df[~mask_C]
df_Other['Type'] = 'Type - Other'

# Conctenate all the dataframes
df = pd.concat([df_A, df_B, df_C, df_Other], axis=0,sort=False)

Now, what is the most efficient and pythonic way to do this?

BENY

I feel like we can do Categorical then transform

df['Type']=pd.Categorical(df.ItemCode,['A','B','C'],ordered=True)

df['Type']='Type_'+df.groupby('InvoiceNo')['Type'].transform('min').fillna('other')

Update

df['Type']=pd.Categorical(df.ItemCode,['A','B','C'],ordered=True)
df=df.sort_values('Type')
df['Type']='Type_'+df.groupby('InvoiceNo')['Type'].transform('first').fillna('other')
df=df.sort_index()

df
Out[32]: 
     InvoiceNo ItemCode  Qty        Type
0    Inv-001          A    2      Type_A
1    Inv-001          B    3      Type_A
2    Inv-001          C    1      Type_A
3    Inv-002          B    3      Type_B
4    Inv-002          D    4      Type_B
5    Inv-003          C    3      Type_C
6    Inv-003          D    9      Type_C
7    Inv-004          D    5  Type_other
8    Inv-004          E    8  Type_other
9    Inv-005          X    2  Type_other

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Pandas most efficient way to filter dataframe based on groupby mask

From Dev

Create binary column in pandas dataframe based on priority

From Dev

Filter Pandas DataFrame based on style

From Dev

Filter pandas dataframe columns based on other dataframe

From Dev

Modify Dataframe based on Priority

From Dev

Most efficient way to filter pandas rows based on row-wise comparison to columns from another dataframe?

From Dev

Is there a way to filter a dataframe based on a specific value but also keep all other values for the unique identifier using pandas?

From Dev

Fastest way of filter index values based on list values from multiple columns in Pandas Dataframe?

From Dev

Filter a dataframe based specific condition in pandas

From Python

Filter pandas dataframe based on column list values

From Dev

Filter a pandas dataframe based on a derived field

From Java

Filter Pandas dataframe based on combination of two columns

From Python

Filter a pandas dataframe based on two columns

From Dev

Unable to filter pandas dataframe based on set of substrings

From Python

Filter dataframe based on groupby and pandas series

From Dev

Pandas dataframe split based on a filter on groupby

From Dev

Filter pandas Dataframe based on max values in a column

From Dev

How to filter a pandas dataframe based on the length of a entry

From Dev

Pandas filter DataFrame based on row , column and date

From Dev

How to filter sessions based on their length in a pandas DataFrame

From Dev

Pandas DataFrame: Replace based on filter and regex extract

From Dev

How to filter the items in a DataFrame based on a list in pandas?

From Dev

Filter rows based on certain conditions in pandas dataframe

From Dev

Filter rows of a dataframe based on certain conditions in pandas

From Dev

filter pandas dataframe based in another column

From Dev

Filter a pandas dataframe based on dates and groupby

From Dev

Filter records based on timestamp in pandas dataframe

From Dev

Filter pandas dataframe rows based on multiple conditions

From Dev

A better way to filter a column in a Pandas DataFrame

Related Related

  1. 1

    Pandas most efficient way to filter dataframe based on groupby mask

  2. 2

    Create binary column in pandas dataframe based on priority

  3. 3

    Filter Pandas DataFrame based on style

  4. 4

    Filter pandas dataframe columns based on other dataframe

  5. 5

    Modify Dataframe based on Priority

  6. 6

    Most efficient way to filter pandas rows based on row-wise comparison to columns from another dataframe?

  7. 7

    Is there a way to filter a dataframe based on a specific value but also keep all other values for the unique identifier using pandas?

  8. 8

    Fastest way of filter index values based on list values from multiple columns in Pandas Dataframe?

  9. 9

    Filter a dataframe based specific condition in pandas

  10. 10

    Filter pandas dataframe based on column list values

  11. 11

    Filter a pandas dataframe based on a derived field

  12. 12

    Filter Pandas dataframe based on combination of two columns

  13. 13

    Filter a pandas dataframe based on two columns

  14. 14

    Unable to filter pandas dataframe based on set of substrings

  15. 15

    Filter dataframe based on groupby and pandas series

  16. 16

    Pandas dataframe split based on a filter on groupby

  17. 17

    Filter pandas Dataframe based on max values in a column

  18. 18

    How to filter a pandas dataframe based on the length of a entry

  19. 19

    Pandas filter DataFrame based on row , column and date

  20. 20

    How to filter sessions based on their length in a pandas DataFrame

  21. 21

    Pandas DataFrame: Replace based on filter and regex extract

  22. 22

    How to filter the items in a DataFrame based on a list in pandas?

  23. 23

    Filter rows based on certain conditions in pandas dataframe

  24. 24

    Filter rows of a dataframe based on certain conditions in pandas

  25. 25

    filter pandas dataframe based in another column

  26. 26

    Filter a pandas dataframe based on dates and groupby

  27. 27

    Filter records based on timestamp in pandas dataframe

  28. 28

    Filter pandas dataframe rows based on multiple conditions

  29. 29

    A better way to filter a column in a Pandas DataFrame

HotTag

Archive