我正在尝试在mutate中应用过滤器,但是我没有想出在保持数据帧分组完整的同时应用过滤器的正确方法。
这是一个简单的可复制示例:
# Sample data
my_dates = seq(as.Date("2020/1/1"), by = "month", length.out = 6)
grp = c(rep("A",3), rep("B", 3))
x = c(2,4,6,8,10,12)
my_df <- data.frame(my_dates, grp, x)
my_dates grp x
1 2020-01-01 A 2
2 2020-02-01 A 4
3 2020-03-01 A 6
4 2020-04-01 B 8
5 2020-05-01 B 10
6 2020-06-01 B 12
# Pick a max date for which the data will be filtered
max_date <- "2020-05-01"
# Try to get the average by group, after filtering out the max date included
filt_data <- my_df %>%
group_by(grp) %>%
mutate(included_data = my_dates < max_date,
my_mean = mean(filter(., my_dates < max_date)$x)
)
# A tibble: 6 x 5
# Groups: grp [2]
my_dates grp x included_data my_mean
<date> <fct> <dbl> <lgl> <dbl>
1 2020-01-01 A 2 TRUE 5
2 2020-02-01 A 4 TRUE 5
3 2020-03-01 A 6 TRUE 5
4 2020-04-01 B 8 TRUE 5
5 2020-05-01 B 10 FALSE 5
6 2020-06-01 B 12 FALSE 5
我希望得到的输出是这样的,其中,组A包含的数据的平均值=平均值(2,4,6)= 4,组B包含的数据的平均值= mean(8)= 8:
my_dates grp x included_data my_mean
<date> <fct> <dbl> <lgl> <dbl>
1 2020-01-01 A 2 TRUE 4
2 2020-02-01 A 4 TRUE 4
3 2020-03-01 A 6 TRUE 4
4 2020-04-01 B 8 TRUE 8
5 2020-05-01 B 10 FALSE 8
6 2020-06-01 B 12 FALSE 8
我不确定正确的mutate和filter是什么,因此不胜感激,也可以解释为何上述方法无法按预期工作。
谢谢!
在这里,最好使用“ included_data”中的索引来对“ x”列进行子集化,而不要再做另一个 filter
library(dplyr)
my_df %>%
group_by(grp) %>%
mutate(included_data = my_dates < max_date,
my_mean = mean(x[included_data])) %>%
ungroup
-输出
# A tibble: 6 x 5
# my_dates grp x included_data my_mean
# <date> <chr> <dbl> <lgl> <dbl>
#1 2020-01-01 A 2 TRUE 4
#2 2020-02-01 A 4 TRUE 4
#3 2020-03-01 A 6 TRUE 4
#4 2020-04-01 B 8 TRUE 8
#5 2020-05-01 B 10 FALSE 8
#6 2020-06-01 B 12 FALSE 8
关于OP的代码为何不起作用的原因,.
它是完整数据集,它正在从该完整数据而不是分组数据中提取子集。我们可以用cur_data()
代替.
my_df %>%
group_by(grp) %>%
mutate(included_data = my_dates < max_date,
my_mean = mean(filter(cur_data(), my_dates < max_date)$x)) %>%
ungroup
# A tibble: 6 x 5
# my_dates grp x included_data my_mean
# <date> <chr> <dbl> <lgl> <dbl>
#1 2020-01-01 A 2 TRUE 4
#2 2020-02-01 A 4 TRUE 4
#3 2020-03-01 A 6 TRUE 4
#4 2020-04-01 B 8 TRUE 8
#5 2020-05-01 B 10 FALSE 8
#6 2020-06-01 B 12 FALSE 8
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句