Pythonic way to regroup a pandas dataframe using max of a column

Cybernetician Published at Dev

Cybernetician

I have the following data frame that has been obtained by applying df.groupby(['category', 'unit_quantity']).count()

category	unit_quantity	Count
banana	1EA	5
eggs	100G	22
	100ML	1
full cream milk	100G	5
	100ML	1
	1L	38

Let's call this latter dataframe as grouped. I want to find a way to regroup using columns unit_quantity and Count it and get

category	unit_quantity	Count	Most Frequent unit_quantity
banana	1EA	5	1EA
eggs	100G	22	100G
	100ML	1	100G
full cream milk	100G	5	1L
	100ML	1	1L
	1L	38	1L

Now, I tried to apply grouped.groupby(level=1).max() which gives me

unit_quantity
100G	22
100ML	1
1EA	5
1L	38

Now, because the indices of the latter and grouped do not coincide, I cannot join it using .merge. Does someone know how to solve this issue?

Thanks in advance

tlentali

Starting from your DataFrame :

>>> import pandas as pd

>>> df = pd.DataFrame({'category': ['banana', 'eggs', 'eggs', 'full cream milk', 'full cream milk', 'full cream milk'], 
...                    'unit_quantity': ['1EA', '100G', '100ML', '100G', '100ML', '1L'], 
...                    'Count': [5, 22, 1, 5, 1, 38],}, 
...                   index = [0, 1, 2, 3, 4, 5]) 
>>> df
    category    unit_quantity   Count
0   banana                1EA       5
1   eggs                 100G      22
2   eggs                100ML       1
3   full cream milk      100G       5
4   full cream milk     100ML       1
5   full cream milk        1L      38

You can use the transform method applied on max of the column Count in order to keep your category and unit_quantity values :

>>> idx = df.groupby(['unit_quantity'])['Count'].transform(max) == df['Count']
>>> df[idx]
    category    unit_quantity   Count
0   banana                1EA       5
1   eggs                 100G      22
2   eggs                100ML       1
4   full cream milk     100ML       1
5   full cream milk        1L      38

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-07-11

Comments

0 comments

From Dev

Related Related

Article

Pythonic way to regroup a pandas dataframe using max of a column

Pythonic way to regroup a pandas dataframe using max of a column

Efficient/Pythonic way to create lists from pandas Dataframe column

Pythonic way for calculating length of lists in pandas dataframe column

Pythonic way to create pairs of values in a column in dataframe

pythonic way to parse/split URLs in a pandas dataframe

more pythonic way - pandas dataframe manipulation

Pythonic way to calculate streaks in pandas dataframe

A Pythonic way to reshape Pandas.DataFrame's

pandas - Pythonic way to slicing DataFrame with DateTimeIndex

How to Invert column values in pandas - pythonic way?

pythonic way to detect specific pandas column type

Pandas: a Pythonic way to create a hyperlink from a value stored in another column of the dataframe

Pythonic way of calculating difference between nth and n-1th value in a large dataframe using Pandas?

Regroup column values in a pandas df

pythonic way to find column values of a dataframe in a given string

How to mutate a column of a grouped dataframe using pandas in a more readable way?

Is there a way of using isin() as calculator function for another column in pandas dataframe?

Efficient/Pythonic way to Filter pandas DataFrame based on priority

Pythonic way to use an 'slicer' and a 'where'-equivalent on a pandas dataframe

Pythonic way to convert Pandas dataframe from wide to long

Pythonic way of obtaining serial correlation of elements in pandas dataframe

Pandas dataframe, each cell into list - more pythonic way?

Remove nans from lists in all columsn of a pandas dataframe (pythonic way)

A pythonic and uFunc-y way to turn pandas column into "increasing" index?

Most Pythonic way to remove special characters from rows in a column in Pandas

What is the most efficient & pythonic way to recode a pandas column?

Python : Adding conditional column to pandas dataframe, more pythonic solution?

What is the fastest way to find the group by max in a column in a Python Pandas dataframe AND mark it?

Normalize pandas dataframe column by the max observed to date

Pandas DataFrame get column combined max values