Mean of the first x values

David Glasser

Let's say I have a table as follows. Each row represents a cohort and each column a time period. Some values are nan as that time period hasn't occurred yet.

For each column, I want to get the mean of the first four values (excluding nan). Is there an easy way to accomplish this in pandas?

| name     |   1 |          2 |           3 |          4 |            5 |            6 |           7 |            8 |            9 |           10 |           11 |           12 |
|----------|----:|-----------:|------------:|-----------:|-------------:|-------------:|------------:|-------------:|-------------:|-------------:|-------------:|-------------:|
| 2019     | nan | -0.0127273 | -0.0202578  | -1         | nan          | nan          | nan         | nan          | nan          | nan          | nan          | nan          |
| 2018     | nan | -0.0245184 | -0.0143627  | -0.0309654 |  -0.0131579  |  -0.0171429  |  -1         | nan          | nan          | nan          | nan          | nan          |
| 2017     | nan | -0.0122807 | -0.00888099 | -0.0340502 |  -0.0222635  |  -0.0151803  |  -0.0346821 |   0.011976   |  -0.0118343  |  -1          | nan          | nan          |
| 2016     | nan | -0.0196429 | -0.010929   | -0.0570902 |  -0.0195312  |   0.00199203 |  -0.027833  |  -0.0102249  |  -0.0103306  |   0.00835073 |  -0.00828157 |  -0.0605428  |
| 2015     | nan | -0.0158451 | -0.0125224  | -0.0525362 |  -0.0114723  |  -0.0154739  |  -0.0314342 |  -0.00811359 |  -0.00817996 |  -0.00206186 |   0.00206612 |  -0.0206186  |
| 2014     | nan | -0.0175439 | -0.00892857 | -0.0486486 |  -0.00757576 |  -0.0114504  |  -0.030888  |  -0.0059761  |  -0.0140281  |   0.0121951  |  -0.00803213 |  -0.0425101  |
| 2013     | nan | -0.0250447 | -0.00550459 | -0.0682657 |  -0.0158416  |  -0.00402414 |  -0.0323232 |  -0.0167015  |  -0.00212314 |  -0.0191489  |   0.0021692  |  -0.017316   |
| 2012     | nan | -0.0118443 | -0.00171233 | -0.0703259 |  -0.00184502 |   0.00554529 |  -0.0294118 |  -0.00757576 |  -0.00763359 |  -0.00769231 |  -0.00387597 |   0.00194553 |
| 2011     | nan | -0.0174825 | -0.0106762  | -0.0377698 |  -0.0261682  |  -0.00767754 |  -0.0174081 |  -0.015748   |  -0.004      |  -0.0200803  |   0.00409836 |  -0.0285714  |
| 2010     | nan | -0.0234657 | -0.00369686 | -0.0352505 |  -0.0269231  |   0.00395257 |  -0.019685  |  -0.0100402  |  -0.00608519 |  -0.0102041  |  -0.0123711  |  -0.00626305 |
Manakin

IIUC, we can stack to remove any NaN values, then apply a groupby to get the count of each value, creating a pseudo index of sorts.

We can then pass a final groupby and concat the mean at the bottom.

s = df.iloc[:, 1:].T.stack().to_frame("vals")
s["idx"] = s.groupby(level=0).cumcount()
s1 = (
    s[s["idx"].le(3)]
    .groupby(level=0)
    .mean()
    .drop("idx", 1)
    .stack()
    .unstack(0)
    .reset_index(0, drop=True)
)
# s1.columns = s1.columns.astype(str) if columns are strings.

df_mean = pd.concat([df,s1])

Another method would be to melt with append.

s = pd.melt(df.iloc[:,1:]).dropna()
s1 = s.assign(key=s.groupby("variable").cumcount()).query("key <= 3").groupby("variable")[
    "value"
].mean().to_frame('mean').T
df_mean = df.append(s1)

        name   1         2         3         4         5         6         7  \
1     2019.0 NaN -0.012727 -0.020258 -1.000000       NaN       NaN       NaN   
2     2018.0 NaN -0.024518 -0.014363 -0.030965 -0.013158 -0.017143 -1.000000   
3     2017.0 NaN -0.012281 -0.008881 -0.034050 -0.022263 -0.015180 -0.034682   
4     2016.0 NaN -0.019643 -0.010929 -0.057090 -0.019531  0.001992 -0.027833   
5     2015.0 NaN -0.015845 -0.012522 -0.052536 -0.011472 -0.015474 -0.031434   
6     2014.0 NaN -0.017544 -0.008929 -0.048649 -0.007576 -0.011450 -0.030888   
7     2013.0 NaN -0.025045 -0.005505 -0.068266 -0.015842 -0.004024 -0.032323   
8     2012.0 NaN -0.011844 -0.001712 -0.070326 -0.001845  0.005545 -0.029412   
9     2011.0 NaN -0.017483 -0.010676 -0.037770 -0.026168 -0.007678 -0.017408   
10    2010.0 NaN -0.023466 -0.003697 -0.035250 -0.026923  0.003953 -0.019685   
mean     NaN NaN -0.017292 -0.013608 -0.280526 -0.016606 -0.011451 -0.273487   

             8         9        10        11        12  
1          NaN       NaN       NaN       NaN       NaN  
2          NaN       NaN       NaN       NaN       NaN  
3     0.011976 -0.011834 -1.000000       NaN       NaN  
4    -0.010225 -0.010331  0.008351 -0.008282 -0.060543  
5    -0.008114 -0.008180 -0.002062  0.002066 -0.020619  
6    -0.005976 -0.014028  0.012195 -0.008032 -0.042510  
7    -0.016702 -0.002123 -0.019149  0.002169 -0.017316  
8    -0.007576 -0.007634 -0.007692 -0.003876  0.001946  
9    -0.015748 -0.004000 -0.020080  0.004098 -0.028571  
10   -0.010040 -0.006085 -0.010204 -0.012371 -0.006263  
mean -0.003085 -0.011093 -0.245379 -0.003020 -0.035247 

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Sum all x-values and subtract by their mean

From Dev

How to find mean across rows, grouped by first row values?

From Dev

r - Calculated mean and sum values group by the first row

From Dev

Linq replace values until first X value

From Dev

Count the first x integers of a group of values

From Dev

Is there any command to find mean of first 5 values then next 5 values for a total of 1000 values in a vector in MATLAB

From Dev

Getting a mean curve of several curves with x-values not being the same

From Dev

R Studio getting the mean of vector of first 5 values, next 5 values etc

From Dev

Count all values in object where X is the first key letter

From Dev

Return first x values from each key in a hash

From Dev

How to select first X values of Pivot Filter, in Excel

From Dev

Python pandas: Sum first n rows of grouped values x

From Dev

Squash dataframe by a certain column to rows that contains first and last timestamps and mean of values

From Dev

python How to replace the first f(x) values of each x in array(x,n)

From Dev

2D bin (x,y) and calculate mean of values (c) of 10 deepest data points (z)

From Dev

What is mean by 'cross-validation with non-unique 'x' values seems doubtful' in Smoothing spline in R?

From Dev

How to find grid neighbours (x, y as integers) group them and calculate mean of their values in spark

From Dev

How to count the number of cell pairs containing x and y values larger/smaller than the mean in excel?

From Dev

Generate list of n (strictly positive) values such that the list has predetermined mean x and std. dev. y

From Java

Numpy mean of nonzero values

From Dev

Finding the mean of certain values

From Dev

Calculate mean of calculated values

From Dev

sorting a dictionary by mean of their values

From Dev

Filling null values with mean

From Dev

Continuous calculation of the mean of values?

From Dev

Calculate mean difference values

From Dev

Replacing missing values with mean

From Dev

Replace missing values with mean

From Dev

Mean of values in pandas dataframe

Related Related

  1. 1

    Sum all x-values and subtract by their mean

  2. 2

    How to find mean across rows, grouped by first row values?

  3. 3

    r - Calculated mean and sum values group by the first row

  4. 4

    Linq replace values until first X value

  5. 5

    Count the first x integers of a group of values

  6. 6

    Is there any command to find mean of first 5 values then next 5 values for a total of 1000 values in a vector in MATLAB

  7. 7

    Getting a mean curve of several curves with x-values not being the same

  8. 8

    R Studio getting the mean of vector of first 5 values, next 5 values etc

  9. 9

    Count all values in object where X is the first key letter

  10. 10

    Return first x values from each key in a hash

  11. 11

    How to select first X values of Pivot Filter, in Excel

  12. 12

    Python pandas: Sum first n rows of grouped values x

  13. 13

    Squash dataframe by a certain column to rows that contains first and last timestamps and mean of values

  14. 14

    python How to replace the first f(x) values of each x in array(x,n)

  15. 15

    2D bin (x,y) and calculate mean of values (c) of 10 deepest data points (z)

  16. 16

    What is mean by 'cross-validation with non-unique 'x' values seems doubtful' in Smoothing spline in R?

  17. 17

    How to find grid neighbours (x, y as integers) group them and calculate mean of their values in spark

  18. 18

    How to count the number of cell pairs containing x and y values larger/smaller than the mean in excel?

  19. 19

    Generate list of n (strictly positive) values such that the list has predetermined mean x and std. dev. y

  20. 20

    Numpy mean of nonzero values

  21. 21

    Finding the mean of certain values

  22. 22

    Calculate mean of calculated values

  23. 23

    sorting a dictionary by mean of their values

  24. 24

    Filling null values with mean

  25. 25

    Continuous calculation of the mean of values?

  26. 26

    Calculate mean difference values

  27. 27

    Replacing missing values with mean

  28. 28

    Replace missing values with mean

  29. 29

    Mean of values in pandas dataframe

HotTag

Archive