在下面的示例中,我想通过两个变量创建一个摘要统计信息。当我使用时dplyr::group_by
,我会得到正确的答案;当我使用时dplyr::group_by_
,它比我想要的多了一个层次。
library(dplyr)
set.seed(919)
df <- data.frame(
a = c(1, 1, 1, 2, 2, 2),
b = c(3, 3, 4, 4, 5, 5),
x = runif(6)
)
# Gives correct answer
df %>%
group_by(a, b) %>%
summarize(total = sum(x))
# Source: local data frame [4 x 3]
# Groups: a [?]
#
# a b total
# <dbl> <dbl> <dbl>
# 1 1 3 1.5214746
# 2 1 4 0.7150204
# 3 2 4 0.1234555
# 4 2 5 0.8208454
# Wrong answer -- too many levels summarized
df %>%
group_by_(c("a", "b")) %>%
summarize(total = sum(x))
# # A tibble: 2 × 2
# a total
# <dbl> <dbl>
# 1 1 2.2364950
# 2 2 0.9443009
这是怎么回事?
如果要使用变量名向量,则可以将其传递给.dots
参数,如下所示:
df %>%
group_by_(.dots = c("a", "b")) %>%
summarize(total = sum(x))
#Source: local data frame [4 x 3]
#Groups: a [?]
# a b total
# <dbl> <dbl> <dbl>
#1 1 3 1.5214746
#2 1 4 0.7150204
#3 2 4 0.1234555
#4 2 5 0.8208454
或者,您可以按照与NSE相同的方式来使用它:
df %>%
group_by_("a", "b") %>%
summarize(total = sum(x))
#Source: local data frame [4 x 3]
#Groups: a [?]
# a b total
# <dbl> <dbl> <dbl>
#1 1 3 1.5214746
#2 1 4 0.7150204
#3 2 4 0.1234555
#4 2 5 0.8208454
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句