使用dplyr mutate函数根据当前行有条件地创建新变量

debugcn 发表于 Dev

史蒂文·莫里森

我正在为大型数据集创建条件平均值，该条件集涉及数年来连续数周的流感病例。数据的组织方式如下：

数据表

我想做的是创建一个新列，该表将过去几年中同一周的平均病例数制成表格。例如，对于Week.Number为1且Flu.Year为2017的行，我希望新行给出Week.Number == 1且Flu.Year <2017的任何一年的平均计数。通常，我将使用case_when（）函数有条件地将此类列表化。例如，在计算平均每周交易量时，我使用了以下代码：

   mutate(average = case_when(
    Flu.Year==2016 ~ mean(chcc$count[chcc$Flu.Year==2016]),
    Flu.Year==2017 ~ mean(chcc$count[chcc$Flu.Year==2017]),
    Flu.Year==2018 ~ mean(chcc$count[chcc$Flu.Year==2018]),
    Flu.Year==2019 ~ mean(chcc$count[chcc$Flu.Year==2019]),
  ),

但是，由于有4年的数据* 52周，因此需要反复很多次才能阐明条件。有没有办法在dplyr中优雅地编写代码？我一直遇到的问题是，我想根据其他星期中的Week.Number和Flu.Year值来调用count列中的值，但要以Week.Number和Flu.Year的当前值为条件，而我不确定如何完成那个。请让我知道我是否可以提供进一步的信息/细节。

谢谢，史蒂文

dat <- tibble( Flu.Year = rep(2016:2019,each = 52), Week.Number = rep(1:52,4), count = sample(1000, size=52*4, replace=TRUE) )

r2evans

这是错误的形式，$有时在dplyr动词中使用-indexing时会出错。我认为获得该average字段的一种更好的方法是直接group_by(Flu.Year)计算它。

library(dplyr)
set.seed(42)
dat <- tibble(
  Flu.Year = sample(2016:2020, size=100, replace=TRUE),
  count = sample(1000, size=100, replace=TRUE)
)

dat %>%
  group_by(Flu.Year) %>%
  mutate(average = mean(count)) %>%
  # just to show a quick summary
  slice(1:3) %>%
  ungroup()
# # A tibble: 15 x 3
#    Flu.Year count average
#       <int> <int>   <dbl>
#  1     2016   734    578.
#  2     2016   356    578.
#  3     2016   411    578.
#  4     2017   217    436.
#  5     2017   453    436.
#  6     2017   920    436.
#  7     2018   963    558 
#  8     2018   609    558 
#  9     2018   536    558 
# 10     2019   943    543.
# 11     2019   740    543.
# 12     2019   536    543.
# 13     2020   627    494.
# 14     2020   218    494.
# 15     2020   389    494.

另一种方法是生成汇总表（每年仅一行），然后将其重新加入原始数据。

dat %>%
  group_by(Flu.Year) %>%
  summarize(average = mean(count))
# # A tibble: 5 x 2
#   Flu.Year average
#      <int>   <dbl>
# 1     2016    578.
# 2     2017    436.
# 3     2018    558 
# 4     2019    543.
# 5     2020    494.

dat %>%
  group_by(Flu.Year) %>%
  summarize(average = mean(count)) %>%
  full_join(dat, by = "Flu.Year")
# # A tibble: 100 x 3
#    Flu.Year average count
#       <int>   <dbl> <int>
#  1     2016    578.   734
#  2     2016    578.   356
#  3     2016    578.   411
#  4     2016    578.   720
#  5     2016    578.   851
#  6     2016    578.   822
#  7     2016    578.   465
#  8     2016    578.   679
#  9     2016    578.    30
# 10     2016    578.   180
# # ... with 90 more rows

聊天后的结果：

tibble( Flu.Year = rep(2016:2018,each = 3), Week.Number = rep(1:3,3), count = 1:9 )  %>%
  arrange(Flu.Year, Week.Number) %>%
  group_by(Week.Number) %>%
  mutate(year_week.average = lag(cumsum(count) / seq_along(count)))
# # A tibble: 9 x 4
# # Groups:   Week.Number [3]
#   Flu.Year Week.Number count year_week.average
#      <int>       <int> <int>             <dbl>
# 1     2016           1     1              NA  
# 2     2016           2     2              NA  
# 3     2016           3     3              NA  
# 4     2017           1     4               1  
# 5     2017           2     5               2  
# 6     2017           3     6               3  
# 7     2018           1     7               2.5
# 8     2018           2     8               3.5
# 9     2018           3     9               4.5

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-04-2

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

使用dplyr mutate函数根据当前行有条件地创建新变量

使用dplyr mutate函数根据当前行有条件地创建新变量

使用dplyr有条件地更新多个变量

使用dplyr有条件地设置列名称

使用`dplyr`有条件地改变列值

使用dplyr有条件地替换因子变量的级别

有条件地计入dplyr

如何使用EJS有条件地显示变量

使用TSQL有条件地创建存储过程

使用dplyr有条件地替换列中的值

R dplyr有条件地有效创建多个列

如何在 dplyr 管道中传递变量名以有条件地求和？

寻找dplyr函数以有条件地应用过滤器

在dplyr中有条件地突变数据

使用React Router当前路由在菜单上有条件地设置活动类

根据使用中的prevState有条件地设置状态效果导致指数重新渲染

根据组件的使用位置，有条件地将className应用于react组件

根据数字而非模式有条件地使用gsub

如何根据用户权限使用Automapper有条件地映射属性

使用VBA根据日期有条件地插入列

如何使用shell管道根据退出代码有条件地显示输出？

如何使用 jQuery 根据站点的语言有条件地显示链接

使用 dplyr 有条件地将列中的值替换为另一列中的值

在dplyr :: mutate中使用条件

有条件地使用data.table中的变量

使用R dyplyr有条件地重新编码/替换变量？

使用@available有条件地创建一个类

如何使用isin有条件地创建Pandas列？

使用xsl有条件地创建空文件/ output

是否可以使用模板元编程有条件地禁用全局函数定义？

在r中使用for循环函数减去行并有条件地打印输出