我有一个很大的数据集,并按年份对数据集进行分组并选择7个变量,然后使用摘要,尝试按组获取每个变量的统计信息。但是我只获得每个组的统计信息,而不是每个变量的统计信息。如何解释结果?如何获得每个变量的结果?
v<-colnames(Cashflow)[c(2,4:ncol(Cashflow))]
Cstats<-Cashflow%>%
group_by(Y)%>%
summarise(mean = mean(get(v),na.rm = TRUE),
observation = n(),
sd = sd(get(v),na.rm = TRUE),
min = min(get(v),na.rm = TRUE),
q25 = quantile(get(v),probs = c(0.25),na.rm = TRUE),
median = median(get(v),na.rm = TRUE),
q75 = quantile(get(v),probs = c(0.75),na.rm = TRUE),
max = max(get(v),na.rm = TRUE))```
我的结果是这样的:
year mean sd min
1997 1 2 3
1998 2 3 4
一旦添加for循环:
for (name in v){
Cashflow%>%
group_by(Y)%>%
summarise(mean = mean(get(name),na.rm = TRUE),
observation = n(),
sd = sd(get(name),na.rm = TRUE),
我得到错误:
summarise()
取消分组输出(用.groups
参数覆盖)
summarise()
取消分组输出(用.groups
参数覆盖)
summarise()
取消分组输出(用.groups
参数覆盖)
有人可以给我一些建议吗?
如果要对多个列执行此操作,请使用across
代替get
(并且get
仅返回第一列的值)
library(dplyr)
Cashflow %>%
group_by(Y)%>%
summarise(across(v,
list(mean = ~ mean(., na.rm = TRUE),
sd = ~ sd(., na.rm = TRUE),
min = ~ min(., na.rm = TRUE),
median = ~ median(., na.rm = TRUE),
q25 = ~ quantile(., probs = 0.25, na.rm = TRUE),
q75 = ~ quantile(., probs = 0.75, na.rm = TRUE))),
observation = n(), .groups = 'drop')
使用可复制的示例
data(mtcars)
v <- names(mtcars)[c(1, 3:7)]
mtcars %>%
group_by(gear) %>%
summarise(across(v, list(mean = ~ mean(., na.rm = TRUE),
sd = ~ sd(., na.rm = TRUE),
min = ~ min(., na.rm = TRUE),
median = ~ median(., na.rm = TRUE),
q25 = ~ quantile(., probs = 0.25, na.rm = TRUE),
q75 = ~ quantile(., probs = 0.75, na.rm = TRUE))),
observation = n(), .groups = 'drop')
# A tibble: 3 x 39
# gear mpg_mean mpg_sd mpg_min mpg_median mpg_q25 mpg_q75 disp_mean disp_sd disp_min disp_median disp_q25 disp_q75 hp_mean hp_sd
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 3 16.1 3.37 10.4 15.5 14.5 18.4 326. 94.9 120. 318 276. 380 176. 47.7
#2 4 24.5 5.28 17.8 22.8 21 28.1 123. 38.9 71.1 131. 78.9 160 89.5 25.9
#3 5 21.4 6.66 15 19.7 15.8 26 202. 115. 95.1 145 120. 301 196. 103.
# … with 24 more variables: hp_min <dbl>, hp_median <dbl>, hp_q25 <dbl>, hp_q75 <dbl>, drat_mean <dbl>, drat_sd <dbl>,
# drat_min <dbl>, drat_median <dbl>, drat_q25 <dbl>, drat_q75 <dbl>, wt_mean <dbl>, wt_sd <dbl>, wt_min <dbl>, wt_median <dbl>,
# wt_q25 <dbl>, wt_q75 <dbl>, qsec_mean <dbl>, qsec_sd <dbl>, qsec_min <dbl>, qsec_median <dbl>, qsec_q25 <dbl>, qsec_q75 <dbl>,
# observation <int>
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句