r - Calculated mean and sum values group by the first row

user7987850 Published at Dev

user7987850

I have a dataframe, I would like to calculate all the mean values of x and all the sum of y group by the first row of the dateframe.

The dateframe to be calculate

The following link is the result I want. The result expected

Here are the data.

dt=structure(list(year = structure(c(5L, 1L, 2L, 3L, 4L), .Label = c("1980", 
    "1981", "1982", "1985", "group"), class = "factor"), x1 = structure(c(4L, 
    1L, 3L, 2L, 1L), .Label = c("1", "2", "4", "A"), class = "factor"), 
        y1 = structure(c(4L, 1L, 3L, 2L, 2L), .Label = c("1", "3", 
        "5", "A"), class = "factor"), x2 = structure(c(5L, 1L, 4L, 
        3L, 2L), .Label = c("2", "4", "5", "6", "A"), class = "factor"), 
        y2 = structure(c(4L, 1L, 3L, 3L, 2L), .Label = c("3", "5", 
        "7", "A"), class = "factor"), x3 = structure(c(4L, 1L, 3L, 
        2L, 1L), .Label = c("4", "6", "8", "B"), class = "factor"), 
        y3 = structure(c(4L, 1L, 3L, 2L, 1L), .Label = c("3", "5", 
        "6", "B"), class = "factor"), x4 = structure(c(4L, 1L, 3L, 
        2L, 3L), .Label = c("2", "4", "5", "C"), class = "factor"), 
        y4 = structure(c(5L, 1L, 2L, 3L, 4L), .Label = c("3", "4", 
        "5", "6", "C"), class = "factor"), x5 = structure(c(5L, 2L, 
        1L, 3L, 4L), .Label = c("3", "4", "6", "7", "C"), class = "factor"), 
        y5 = structure(c(4L, 2L, 1L, 3L, 2L), .Label = c("2", "5", 
        "8", "C"), class = "factor")), class = "data.frame", row.names = c(NA, 
    -5L))

And result expected,

result_expected <- structure(list(year = c(1980L, 1981L, 1982L, 1985L), A_x_mean = c(1.5, 
5, 3.5, 2.5), A_y_sum = c(4L, 12L, 10L, 8L), B_x_mean = c(4L, 
8L, 6L, 4L), B_y_sum = c(3L, 6L, 5L, 3L), C_x_mean = 3:6, C_y_sum = c(8L, 
6L, 13L, 11L)), class = "data.frame", row.names = c(NA, -4L))

I have search key words in goole and stackoverflow, but no proper answers. My current thinking is to calculate unique group A,B,C in first row.

require(tidyverse)
group_variables <- dt%>%gather(key,value)%>%distinct(value)%>%arrange(value)

then get the row in group_variables by the for

for i in group_variables{......}

or can I change the structure of the dataframe by gathe and spread in tidyr,and by dplyr method, something just like following code,

dt_new%>% group_by (group)%>%
          summarise(mean=mean(x,na.rm=TRUE),
          sum=sum(x,na.rm=TURE))

StupidWolf

First we need to take out the first row having the group, make the data frame long, simplify x1,x2,x3 to x etc and put the groups back:

group_var = sapply(dt[1,-1],as.character)
mat <- 
dt[-1,] %>% pivot_longer(-year) %>% 
   mutate(value=as.numeric(as.character(value))) %>% 
   mutate(group=as.character(group_var[as.character(name)])) %>% 
   mutate(name=substr(name,1,1))

mat
# A tibble: 40 x 4
   year  name  value group
   <fct> <chr> <dbl> <chr>
 1 1980  x         1 A    
 2 1980  y         1 A    
 3 1980  x         2 A    
 4 1980  y         3 A    
 5 1980  x         4 B    
 6 1980  y         3 B    
 7 1980  x         2 C    
 8 1980  y         3 C    
 9 1980  x         4 C    
10 1980  y         5 C

Now what's left is to group them according to year, name and group and do the respective function, so we define a function:

func = function(DF,func){
DF %>% 
   group_by(group,name,year) %>% 
   summarise_all(func) %>%
   mutate(label=paste(group,name,func,sep="_")) %>%
   ungroup %>%
   select(year,value,label) %>%
   pivot_wider(values_from=value,names_from=label)
}

And we apply it over two parts of the data:

cbind(func(mat %>% filter(name=="x"),"mean"),func(mat %>% filter(name=="y"),"sum"))

year A_x_mean B_x_mean C_x_mean year A_y_sum B_y_sum C_y_sum
1 1980      1.5        4        3 1980       4       3       8
2 1981      5.0        8        4 1981      12       6       6
3 1982      3.5        6        5 1982      10       5      13
4 1985      2.5        4        6 1985       8       3      11

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2020-12-12

Comments

0 comments

From Dev

Related Related

Article

r - Calculated mean and sum values group by the first row

r - Calculated mean and sum values group by the first row

R insert row with mean after group of values

Group and sum by row in R

Calculate mean of calculated values

Calculated column with the sum of values from many columns in a row

R: Group in sparklyr ("sum", "count distinct" , "mean")

sum values from a previous row by group

SQL: Group by Case Statement to sum Row Values

How to return values only for the first row in a group

Group rows into a new row and sum in r

row and column matrix sum in R by group

R First Row By Group When Condition Is Met

R how Sum values by group by date

R - Sum row values in a specific consecutive fashion

R - sum every two rows and divide by the first row in that sum

How to find mean across rows, grouped by first row values?

Calculate sum and average of a column in a pyspark dataframe and create a new row for the calculated values

Calculated mean column for a group transposed in a dataframe

R compute mean and sum of value in dataframe using group_by

R: group_id by changing row values

Get sum of values from last nth row by group id

Don't show row based on condition but sum values in parent group

Mysql Query sum and group by not showing a row with null values

Cumulative sum of first occurence of consecutive True values in a group in Pandas

Row-wise sum of first, second and third highest values

Pandas: Sum of first N non-missing values per row

Clear the variables except first row in a group after separate row R

How to adding row based on the information of first row of each group in R

Use the current calculated value from a group by sum to calculate another value in R using data.table

Summarise to get mean and first value of group by in data.table r