我有一个数据框,例如:
> tab
Groups Species Value
1 Group1 Sp1 1
2 Group1 Sp1 4
3 Group1 Sp2 78
4 Group1 Sp3 NA
5 Group1 Sp4 NA
6 Group2 Sp2 3
7 Group2 Sp3 9
8 Group2 Sp4 8
9 Group3 Sp1 9
10 Group3 Sp3 10
11 Group3 Sp3 110
12 Group3 Sp3 14
我试图保留值 < 80 的组
我试过了:
tab %>%
group_by(Groups) %>%
filter(all(Value < 80))
但我不知道如何NA values
在过滤器中忽略。
在这里我应该得到:
> tab
Groups Species Value
1 Group1 Sp1 1
2 Group1 Sp1 4
3 Group1 Sp2 78
4 Group1 Sp3 NA
5 Group1 Sp4 NA
6 Group2 Sp2 3
7 Group2 Sp3 9
8 Group2 Sp4 8
有没有人有解决方案?谢谢
如果我也有:
> tab
Groups Species Value sp mrca
1 Group1 Sp1 1 3 3
2 Group1 Sp1 4 3 3
3 Group1 Sp2 78 NA NA
4 Group1 Sp3 NA 3 12
5 Group1 Sp4 NA 3 3
6 Group2 Sp2 3 2 3
7 Group2 Sp3 9 2 40
8 Group2 Sp4 8 NA NA
9 Group3 Sp1 9 2 2
10 Group3 Sp3 10 3 3
11 Group3 Sp3 110 3 2
12 Group3 Sp3 14 2 3
我想过滤所有具有 < 80 个值且 sp - mrca = 0:9 的组
我试过机智你的回答:
tab %>%
group_by(Groups) %>%
filter(all(Value < 80 |is.na(Value))) %>%
filter((all(abs(sp - mrca) %in% 0:9)|is.na(sp) & is.na(mrca)))
但它似乎不是正确的代码
我应该得到:
> tab
Groups Species Value sp mrca
1 Group1 Sp1 1 3 3
2 Group1 Sp1 4 3 3
3 Group1 Sp2 78 NA NA
4 Group1 Sp3 NA 3 12
5 Group1 Sp4 NA 3 3
我们可以使用并|
与is.na
tab %>%
group_by(Groups) %>%
filter(all(Value < 80 |is.na(Value)))
# A tibble: 8 x 3
# Groups: Groups [2]
# Groups Species Value
# <chr> <chr> <int>
#1 Group1 Sp1 1
#2 Group1 Sp1 4
#3 Group1 Sp2 78
#4 Group1 Sp3 NA
#5 Group1 Sp4 NA
#6 Group2 Sp2 3
#7 Group2 Sp3 9
#8 Group2 Sp4 8
OP 代码中的问题是,当我们用 包装all
时Value < 80
,比较返回NA
那些值,NA
现在all
也返回NA
而不是逻辑 TRUE/FALSE 并且在 中filter
,NA
默认情况下它会自动删除
为了更好地理解,请检查输出
tab %>%
group_by(Groups) %>%
mutate(ind = all(Value < 80))
和这里的区别
tab %>%
group_by(Groups) %>%
mutate(ind = all(Value < 80| is.na(Value)))
或使用 data.table
library(data.table)
setDT(tab)[, .SD[all(Value < 80 | is.na(Value))], Groups]
或使用 base R
tab[with(tab, ave(Value < 80 | is.na(Value), Groups, FUN = all)),]
对于第二个数据集,
tab1 %>%
group_by(Groups) %>%
filter(all(Value < 80 |is.na(Value)),
all(na.omit(abs(sp-mrca)) %in% 0:9))
tab <- structure(list(Groups = c("Group1", "Group1", "Group1", "Group1",
"Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group3",
"Group3"), Species = c("Sp1", "Sp1", "Sp2", "Sp3", "Sp4", "Sp2",
"Sp3", "Sp4", "Sp1", "Sp3", "Sp3", "Sp3"), Value = c(1L, 4L,
78L, NA, NA, 3L, 9L, 8L, 9L, 10L, 110L, 14L)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
tab1 <- structure(list(Groups = c("Group1", "Group1", "Group1", "Group1",
"Group1", "Group2", "Group2", "Group2", "Group3", "Group3", "Group3",
"Group3"), Species = c("Sp1", "Sp1", "Sp2", "Sp3", "Sp4", "Sp2",
"Sp3", "Sp4", "Sp1", "Sp3", "Sp3", "Sp3"), Value = c(1L, 4L,
78L, NA, NA, 3L, 9L, 8L, 9L, 10L, 110L, 14L), sp = c(3L, 3L,
NA, 3L, 3L, 2L, 2L, NA, 2L, 3L, 3L, 2L), mrca = c(3L, 3L, NA,
12L, 3L, 3L, 40L, NA, 2L, 3L, 2L, 3L)),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句