Subset / group by pandas Data Frame to calculate mean and apply to missing values

kms

I am trying to subset pandas dataframe by a group / category, calculate a statistic and apply it to the original dataframe for missing values in the group.

df1 = pd.DataFrame({
                    'City': ['SF','NYC','SF','NYC','SF','CHI','LA','LA','CHI'], 
                    'Val': [2,4,0,0,7,4,3,5,6] 
                   })

for name, group in df.groupby(['City']):

    dff = df[df['City'] == name]

    # Calculate mean
    df1 = dff[dff['Val'] != 0]
    mean_val = int(df1['Val'].mean())

Now, I need to apply mean_val to all 0s in the subset.

Henry Ecker

We could mask out the 0 values then groupby transform to calculate the mean and fillna to put the means back, lastly convert the column to int using astype:

s = df['Val'].mask(df['Val'].eq(0))
df['Val'] = s.fillna(s.groupby(df['City']).transform('mean')).astype(int)

Or we can boolean index where Val is 0, mask out the 0 values and assign the results of groupby transform back using loc:

m = df['Val'].eq(0)
df.loc[m, 'Val'] = (
    df['Val'].mask(m)
        .groupby(df['City']).transform('mean')
        .astype(int)
)

Both produce: df:

  City  Val
0   SF    2
1  NYC    4
2   SF    4
3  NYC    4
4   SF    7
5  CHI    4
6   LA    3
7   LA    5
8  CHI    6

We could filter dff to get the index locations relative to df and assign back to modify the original approach:

for name, group in df.groupby(['City']):

    dff = df[df['City'] == name]

    # Calculate mean
    df1 = dff[dff['Val'] != 0]
    mean_val = int(df1['Val'].mean())
    # Assign mean back to `df` at index locations where Val is 0 in group
    df.loc[dff[(dff['Val'] == 0)].index, 'Val'] = mean_val

Although looping is highly discouraged in pandas due to increased runtime.

However, if we are going to use the iterable from groupby we should use the values returned instead of filtering from df:

for name, group in df.groupby(['City']):
    # Create Boolean Index
    m = group['Val'] != 0
    # Calculate mean from grouped dataframe `group`
    mean_val = int(group.loc[m, 'Val'].mean())
    # Assign mean back to `df` at index locations where Val is 0 in group
    df.loc[group[~m].index, 'Val'] = mean_val

DataFrame and imports:

import pandas as pd

df = pd.DataFrame({
    'City': ['SF', 'NYC', 'SF', 'NYC', 'SF', 'CHI', 'LA', 'LA', 'CHI'],
    'Val': [2, 4, 0, 0, 7, 4, 3, 5, 6],
})

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Calculate mean from subset of rows in pandas data frame: groupby or for loop?

From Dev

Difference to group mean in a pandas data frame?

From Dev

Group by and apply custom function in pandas data frame

From Dev

Find a subset of columns of a data frame that have some missing values

From Dev

Subset R data frame based on group-level values

From Dev

R subset data.frame columns by group to maximize row values

From Dev

How to subset a data frame using Pandas based on a group criteria?

From Dev

How to calculate Mean of specific values in each row of a data frame?

From Dev

Group by rows with null values in pandas data frame

From Dev

how to calculate mean of data frame inside in series object (pandas)?

From Java

Calculate mean for selected rows for selected columns in pandas data frame

From Dev

How to calculate partial correlations when the data frame contains missing values

From Dev

Pandas:Calculate mean of a group of n values of each columns of a dataframe

From Dev

Pandas: Fill missing values by mean in each group faster than transform

From Dev

Pandas; calculate mean and append mean to original frame

From Dev

Group by multiple column data frame in pandas and get mean value of a column

From Dev

Replace missing values by a group mean

From Dev

Loop for calculating mean of subset of data frame in r

From Dev

subset data.frame and calculate the frequency

From Dev

Group index values based on other index values in pandas Data Frame

From Dev

How to calculate the average of specific values in a column in a pandas data frame?

From Dev

How to calculate average of values in a Python Pandas Data Frame?

From Dev

Apply function row wise on pandas data frame on columns with numerical values

From Dev

Impute missing data with mean by group

From Dev

Properly handling missing values and formatting for pandas data frame printed to tabulate

From Dev

Replace by incremental values into a pandas data-frame if a column value is missing

From Dev

apply a function to a subset of a data frame but retain the whole data frame

From Dev

Subset a data frame and then apply a mathematical question to each subset in a for loop in R

From Dev

How to use apply group of function in R to calculate mean of values with plus delimiter

Related Related

  1. 1

    Calculate mean from subset of rows in pandas data frame: groupby or for loop?

  2. 2

    Difference to group mean in a pandas data frame?

  3. 3

    Group by and apply custom function in pandas data frame

  4. 4

    Find a subset of columns of a data frame that have some missing values

  5. 5

    Subset R data frame based on group-level values

  6. 6

    R subset data.frame columns by group to maximize row values

  7. 7

    How to subset a data frame using Pandas based on a group criteria?

  8. 8

    How to calculate Mean of specific values in each row of a data frame?

  9. 9

    Group by rows with null values in pandas data frame

  10. 10

    how to calculate mean of data frame inside in series object (pandas)?

  11. 11

    Calculate mean for selected rows for selected columns in pandas data frame

  12. 12

    How to calculate partial correlations when the data frame contains missing values

  13. 13

    Pandas:Calculate mean of a group of n values of each columns of a dataframe

  14. 14

    Pandas: Fill missing values by mean in each group faster than transform

  15. 15

    Pandas; calculate mean and append mean to original frame

  16. 16

    Group by multiple column data frame in pandas and get mean value of a column

  17. 17

    Replace missing values by a group mean

  18. 18

    Loop for calculating mean of subset of data frame in r

  19. 19

    subset data.frame and calculate the frequency

  20. 20

    Group index values based on other index values in pandas Data Frame

  21. 21

    How to calculate the average of specific values in a column in a pandas data frame?

  22. 22

    How to calculate average of values in a Python Pandas Data Frame?

  23. 23

    Apply function row wise on pandas data frame on columns with numerical values

  24. 24

    Impute missing data with mean by group

  25. 25

    Properly handling missing values and formatting for pandas data frame printed to tabulate

  26. 26

    Replace by incremental values into a pandas data-frame if a column value is missing

  27. 27

    apply a function to a subset of a data frame but retain the whole data frame

  28. 28

    Subset a data frame and then apply a mathematical question to each subset in a for loop in R

  29. 29

    How to use apply group of function in R to calculate mean of values with plus delimiter

HotTag

Archive