pandas for each group calculate ratio of two categories, and append as a new column to dataframe using .pipe()

Tommy Lees Published at Dev

Tommy Lees

I have a pandas dataframe like the following:

import pandas as pd

pd.DataFrame({"AAA":["x1","x1","x1","x2","x2","x2"],
              "BBB":["y1","y1","y2","y2","y2","y1"],
              "CCC":["t1","t2","t3","t1","t1","t1"],
              "DDD":[10,11,18,17,21,30]})

Out[1]:
  AAA BBB CCC  DDD
0  x1  y1  t1   10
1  x1  y1  t2   11
2  x1  y2  t3   18
3  x2  y2  t1   17
4  x2  y2  t1   21
5  x2  y1  t1   30

The problem

What I want is to group on column AAA so I have 2 groups - x1, x2.

I want then calculate the ratio of y1 to y2 in column BBB for each group.

And assign this output to a new column Ratio of BBB

The desired output

So I want this as my output.

pd.DataFrame({"AAA":["x1","x1","x1","x2","x2","x2"],
              "BBB":["y1","y1","y2","y2","y2","y1"],
              "CCC":["t1","t2","t3","t1","t1","t1"],
              "DDD":[10,11,18,17,21,30],
              "Ratio of BBB":[0.33,0.33,0.33,0.66,0.66,0.66]})

Out[2]:
  AAA BBB CCC  DDD  Ratio of BBB
0  x1  y1  t1   10          0.33
1  x1  y1  t2   11          0.33
2  x1  y2  t3   18          0.33
3  x2  y2  t1   17          0.66
4  x2  y2  t1   21          0.66
5  x2  y1  t1   30          0.66

Current status

I have currently achieved it like so:

def f(df):
  df["y1"] = sum(df["BBB"] == "y1")
  df["y2"] = sum(df["BBB"] == "y2")
  df["Ratio of BBB"] = df["y2"] / df["y1"]
  return df

df.groupby(df.AAA).apply(f)

What I want to achieve

Is there anyway to achieve this with the .pipe() function?

I was thinking something like this:

df = (df
 .groupby(df.AAA) # groupby a column not included in the current series (df.colname)
 .BBB
 .value_counts()
 .pipe(lambda series: series["BBB"] == "y2" / series["BBB"] == "y1")
 )

Edit: One solution using `pipe()`

N.B: User jpp made clear comment below:

unstack / merge / reset_index operations are unnecessary and expensive

However, I initially intended to use this method i thought I would share it here!

df = (df
      .groupby(df.AAA)                     # groupby the column
      .BBB                                 # select the column with values to calculate ('BBB' with y1 & y2)
      .value_counts()                      # calculate the values (# of y1 per group, # of y2 per group)
      .unstack()                           # turn the rows into columns (y1, y2)
      .pipe(lambda df: df["y1"]/df["y2"])  # calculate the ratio of y1:y2 (outputs a Series)
      .rename("ratio")                     # rename the series 'ratio' so it will be ratio column in output df
      .reset_index()                       # turn the groupby series into a dataframe
      .merge(df)                           # merge with the original dataframe filling in the columns with the key (AAA)
      )

cs95

Looks like you want the ratio of y1 to the total instead. Use groupby + value_counts:

v = df.groupby('AAA').BBB.value_counts().unstack()
df['RATIO'] = df.AAA.map(v.y2 / (v.y2 + v.y1))

  AAA BBB CCC  DDD     RATIO
0  x1  y1  t1   10  0.333333
1  x1  y1  t2   11  0.333333
2  x1  y2  t3   18  0.333333
3  x2  y2  t1   17  0.666667
4  x2  y2  t1   21  0.666667
5  x2  y1  t1   30  0.666667

To generalise for many groups, you may use

df['RATIO'] = df.AAA.map(v.y2 / v.sum(axis=1))

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-11-17

Comments

0 comments

From Dev

How to calculate ratio of values in a pandas dataframe column?

From Dev

Filter with column value then append each group in pandas

From Dev

How to calculate ratio from two different pandas dataframe

From Dev

Using "dplyr" to calculate specific ratio for for each row of dataframe

From Dev

calculate ratio of two factors for each visit using dplyr

From Dev

Split a pandas column and append the new results to the dataframe

From Dev

Pandas calculate new column from two other column variables within dataframe

From Dev

Calculate a new pandas DataFrame from two dataframes

From Dev

How to calculate new "normalized" column in a Pandas dataframe?

From Dev

Create new column by using a list comprehension with two 'for' loops in Pandas DataFrame

From Dev

Pandas Dataframe calculate Time difference for each group and Time difference between two different groups

From Dev

Check if Python list elements are in a Pandas dataframe row and append each unique occurrence of the list values to a new column

From Dev

Generate new column based on ratio of adjacent rows in pandas dataframe

From Dev

Apply kmeans on in each group in pandas DataFrame and save the clusters in a new column in the same DataFrame

From Dev

Pandas DataFrame: How to calculate a new column with Price divided by number of lines of a group category?

From Dev

Python(Pandas), append new column for each new file

From Dev

Using dictionary as a reference to calculate number a new column in a pandas dataframe from a different dataframe

From Dev

Pandas column of lists, append a new column to each list

From Dev

Pandas groupby each column and add new column for each group

From Dev

Python Pandas new dataframe column with group by and condition

From Dev

pandas dataframe group by create a new column

From Dev

Calculate the ratio of occurence within a group in a dataframe

From Dev

Calculate perc of each element in a list for each value in column in pandas dataframe

From Dev

Division of two dataframe with Group by of a Column Pandas

From Dev

Multiply each element in one row and append new column in same dataFrame?

From Dev

Pandas:Calculate mean of a group of n values of each columns of a dataframe

From Java

Append column to pandas dataframe

From Dev

in Pandas, append a new column of a row each time it has a duplicate ID

From Dev

Pandas append data to each new row without column names

Related Related

Article