Calculate mean of df, BUT if =>1 of the values differs >20% from this mean, the mean is set to NaN

Matthi9000 :

I want to calculate the mean of columns a,b,c,d of the dataframe BUT if one of four values in each dataframe row differs more then 20% from this mean (of the four values), the mean has to be set to NaN.

Calculation of the mean of 4 columns is easy, but I'm stuck at defining the condition 'if mean*0.8 <= one of the values in the data row <= mean*1,2 then mean == NaN.

In the example, one or more of the values in ID:5 en ID:87 don't fit in the interval and therefore the mean is set to NaN. (NaN-values in the initial dataframe are ignored when calculating the mean and when applying the 20%-condition to the calculated mean)

So I'm trying to calculate the mean only for the data rows with no 'outliers'.

Initial df:

 ID   a    b    c   d
  2  31   32   31  31
  5  33   52  159   2
  7  51  NaN   52  51 
 87  30   52  421   2
 90  10   11   10  11
102  41   42  NaN  42

Desired df:

 ID   a    b    c   d    mean
  2  31   32   31  31   31.25
  5  33   52  159   2     NaN
  7  51  NaN   52  51   51.33
 87  30   52  421   2     NaN
 90  10   11   10  11   10.50
102  41   42  NaN  42   41.67

Code:

import pandas as pd

import numpy as np



df = pd.DataFrame({"ID": [2,5,7,87,90,102],
    
                    "a": [31,33,51,30,10,41],
     
                    "b": [32,52,np.nan,52,11,42],
      
                    "c": [31,159,52,421,10,np.nan],
  
                    "d": [31,2,51,2,11,42]})


print(df)



a = df.loc[:, ['a','b','c','d']]


df['mean'] = (a.iloc[:,0:]).mean(1)


print(df)


b = df.mean.values[:,None]*0.8 < a.values[:,:] < df.mean.values[:,None]*1.2
print(b)
...


Quang Hoang :

IIUC:

# extract related information
s = df.iloc[:,1:]

# calculate mean
mean = s.mean(1)

# where condition is violated    
mask = s.lt(mean*.8, axis=0) | s.gt(mean*1.2, axis=0)

# mask where mask is True on any row
df['mean'] = mean.mask(mask.any(1))

Output:

    ID   a     b      c   d       mean
0    2  31  32.0   31.0  31  31.250000
1    5  33  52.0  159.0   2        NaN
2    7  51   NaN   52.0  51  51.333333
3   87  30  52.0  421.0   2        NaN
4   90  10  11.0   10.0  11  10.500000
5  102  41  42.0    NaN  42  41.666667

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

df.mean(axis=1) is returning only NaN values

From Dev

Calculate mean of calculated values

From Dev

Calculate mean difference values

From Dev

Calculate the mean of values in a loop

From Dev

How can i fill nan values in a df using group mean?

From Dev

how to ignore nan values and calculate mean of last 3 months

From Dev

Replacing NaN values with group mean

From Dev

Calculate the mean values if the numbers are same

From Dev

dataframe mean calculation -> values that differ >20% from the median should be excluded from the mean-computation

From Dev

Calculate mean from multiple columns

From Dev

Mean computation from accumulated row values while ignoring NaN in MATLAB

From Dev

How to calculate mean values from a linear model in R?

From Dev

How to calculate a mean based on values different and equal from 0

From Dev

Calculate Group Mean and Overall Mean

From Dev

Get the mean of df values that the index is equal to n

From Dev

Get a new df with the mean values of other dfs

From Dev

Replace NaN values of filtered column by the mean

From Dev

How to fill nan values with rolling mean in pandas

From Dev

Fill NaN values wit mean of previous rows?

From Dev

Get mean from row values

From Dev

Sklearn's imputer v/s df.fillnan to replace nan values with mean of the column

From Dev

How to calculate mean of every three values of a list

From Dev

Calculate mean for each row containing lists of values

From Dev

Calculate mean of HashMap values after insertion

From Dev

Calculate mean change in values one day later

From Dev

Calculate mean cell values over different files

From Dev

Calculate max, min and mean values of element in an array

From Dev

Calculate mean on values in python collections.Counter

From Dev

Calculate mean with a filter on a column's values

Related Related

  1. 1

    df.mean(axis=1) is returning only NaN values

  2. 2

    Calculate mean of calculated values

  3. 3

    Calculate mean difference values

  4. 4

    Calculate the mean of values in a loop

  5. 5

    How can i fill nan values in a df using group mean?

  6. 6

    how to ignore nan values and calculate mean of last 3 months

  7. 7

    Replacing NaN values with group mean

  8. 8

    Calculate the mean values if the numbers are same

  9. 9

    dataframe mean calculation -> values that differ >20% from the median should be excluded from the mean-computation

  10. 10

    Calculate mean from multiple columns

  11. 11

    Mean computation from accumulated row values while ignoring NaN in MATLAB

  12. 12

    How to calculate mean values from a linear model in R?

  13. 13

    How to calculate a mean based on values different and equal from 0

  14. 14

    Calculate Group Mean and Overall Mean

  15. 15

    Get the mean of df values that the index is equal to n

  16. 16

    Get a new df with the mean values of other dfs

  17. 17

    Replace NaN values of filtered column by the mean

  18. 18

    How to fill nan values with rolling mean in pandas

  19. 19

    Fill NaN values wit mean of previous rows?

  20. 20

    Get mean from row values

  21. 21

    Sklearn's imputer v/s df.fillnan to replace nan values with mean of the column

  22. 22

    How to calculate mean of every three values of a list

  23. 23

    Calculate mean for each row containing lists of values

  24. 24

    Calculate mean of HashMap values after insertion

  25. 25

    Calculate mean change in values one day later

  26. 26

    Calculate mean cell values over different files

  27. 27

    Calculate max, min and mean values of element in an array

  28. 28

    Calculate mean on values in python collections.Counter

  29. 29

    Calculate mean with a filter on a column's values

HotTag

Archive