Calculate mean of df, BUT if =>1 of the values differs >20% from this mean, the mean is set to NaN

Matthi9000 Published at Python

Matthi9000 :

I want to calculate the mean of columns a,b,c,d of the dataframe BUT if one of four values in each dataframe row differs more then 20% from this mean (of the four values), the mean has to be set to NaN.

Calculation of the mean of 4 columns is easy, but I'm stuck at defining the condition 'if mean*0.8 <= one of the values in the data row <= mean*1,2 then mean == NaN.

In the example, one or more of the values in ID:5 en ID:87 don't fit in the interval and therefore the mean is set to NaN. (NaN-values in the initial dataframe are ignored when calculating the mean and when applying the 20%-condition to the calculated mean)

So I'm trying to calculate the mean only for the data rows with no 'outliers'.

Initial df:

 ID   a    b    c   d
  2  31   32   31  31
  5  33   52  159   2
  7  51  NaN   52  51 
 87  30   52  421   2
 90  10   11   10  11
102  41   42  NaN  42

Desired df:

 ID   a    b    c   d    mean
  2  31   32   31  31   31.25
  5  33   52  159   2     NaN
  7  51  NaN   52  51   51.33
 87  30   52  421   2     NaN
 90  10   11   10  11   10.50
102  41   42  NaN  42   41.67

Code:

import pandas as pd 
import numpy as np

  df = pd.DataFrame({"ID": [2,5,7,87,90,102],     
                    "a": [31,33,51,30,10,41],      
                    "b": [32,52,np.nan,52,11,42],       
                    "c": [31,159,52,421,10,np.nan],   
                    "d": [31,2,51,2,11,42]})  
print(df)  

a = df.loc[:, ['a','b','c','d']]  
df['mean'] = (a.iloc[:,0:]).mean(1)
  print(df)


b = df.mean.values[:,None]*0.8 < a.values[:,:] < df.mean.values[:,None]*1.2 print(b)
...

Quang Hoang :

IIUC:

# extract related information
s = df.iloc[:,1:]

# calculate mean
mean = s.mean(1)

# where condition is violated    
mask = s.lt(mean*.8, axis=0) | s.gt(mean*1.2, axis=0)

# mask where mask is True on any row
df['mean'] = mean.mask(mask.any(1))

Output:

    ID   a     b      c   d       mean
0    2  31  32.0   31.0  31  31.250000
1    5  33  52.0  159.0   2        NaN
2    7  51   NaN   52.0  51  51.333333
3   87  30  52.0  421.0   2        NaN
4   90  10  11.0   10.0  11  10.500000
5  102  41  42.0    NaN  42  41.666667

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-06-2

Comments

0 comments

From Dev

Related Related

Article

Calculate mean of df, BUT if =>1 of the values differs >20% from this mean, the mean is set to NaN

Calculate mean of df, BUT if =>1 of the values differs >20% from this mean, the mean is set to NaN

df.mean(axis=1) is returning only NaN values

Calculate mean of calculated values

Calculate mean difference values

Calculate the mean of values in a loop

How can i fill nan values in a df using group mean?

how to ignore nan values and calculate mean of last 3 months

Replacing NaN values with group mean

Calculate the mean values if the numbers are same

dataframe mean calculation -> values that differ >20% from the median should be excluded from the mean-computation

Calculate mean from multiple columns

Mean computation from accumulated row values while ignoring NaN in MATLAB

How to calculate mean values from a linear model in R?

How to calculate a mean based on values different and equal from 0

Calculate Group Mean and Overall Mean

Get the mean of df values that the index is equal to n

Get a new df with the mean values of other dfs

Replace NaN values of filtered column by the mean

How to fill nan values with rolling mean in pandas

Fill NaN values wit mean of previous rows?

Get mean from row values

Sklearn's imputer v/s df.fillnan to replace nan values with mean of the column

How to calculate mean of every three values of a list

Calculate mean for each row containing lists of values

Calculate mean of HashMap values after insertion

Calculate mean change in values one day later

Calculate mean cell values over different files

Calculate max, min and mean values of element in an array

Calculate mean on values in python collections.Counter

Calculate mean with a filter on a column's values