这是我在做什么的一个例子。data.frame通常有成千上万的记录,并且使用条件我经常要满足更多条件if()
。
library(tidyverse)
# example df 1
coll <- data.frame(id = c("alpha", "alpha", "beta", "beta", "gamma", "delta", "epsilon"),
frequency = c("12.340", "23.340", "12.560", "15.670", "56.230", "12.890", "89.430"),
start = c("2010-01-01", "2015-01-01", "2011-02-02", "2017-02-02", rep("2019-01-01", 3)),
end = c("2011-02-02", NA, "2012-01-01", NA, "2018-02-02", rep(NA, 2))) %>%
mutate(still.active = ifelse(!is.na(end), still.active <- "No", NA),
reason = ifelse(!is.na(end), reason <- "Removed", NA)) %>%
mutate_all(as.character)
# example df 2
mort <- data.frame(id = c("alpha", "beta", "gamma", "delta", "zeta"),
frequency = c("23.340", "15.670", "56.230", "12.890", NA),
date = c("2016-01-01", "2018-01-01", rep("2020-01-01", 3)),
type = c(rep(1, 2), rep(2, 3))
) %>%
mutate_all(as.character)
for(i in 1:nrow(coll)){
for(j in 1:nrow(mort)){
if(coll$id[i] == mort$id[j] & # if these match
coll$frequency[i] == mort$frequency[j] & # and these match
is.na(coll$end[i]) & # and the value I want to fill in is currently blank
mort$type[j] == "1" # and this other condition is met
){
coll$end[i] <- as.character(mort$date[j]) # then assign these cells these values
coll$still.active[i] <- "No"
coll$reason[i] <- "Said so"
}
}
}
嵌套的for循环恰好满足了我的需要,但是在实践中,它们变得非常慢,我想学习一种更好的方法。如果只需要在两个data.frames中匹配一列的值,那么索引就很容易了,例如:
df <- data.frame(id = c("one", "two", "three")) %>% arrange(desc(id))
df2 <- data.frame(id = c("one", "two", "three"),
frequency = c("23.340", "15.670", "56.230"))
df$freq <- df2[match(df$id, df2$id), "frequency"]
但是我不确定在有更多条件时如何到达那里,即使可以,我认为其他人可能很难阅读并弄清楚发生了什么。我喜欢嵌套的for循环的一件事是它很容易阅读。也许我只是习惯他们。我可以ifelse()
改为使用嵌套语句吗?还有哪些其他选择?
可以使用left_join
和ifelse
(或case_when
)完成此操作:
coll %>% left_join(mort, by = "id") %>%
mutate(tmp = (frequency.x == frequency.y) & is.na(end) & type == "1" ) %>%
mutate(end = ifelse(tmp, as.character(date), end),
still.active = ifelse(tmp, "No", still.active),
reason = ifelse(tmp,"Said so", reason)) %>%
select(id, frequency = frequency.x, start, end, still.active,reason)
id frequency start end still.active reason
1 alpha 12.340 2010-01-01 2011-02-02 No Removed
2 alpha 23.340 2015-01-01 2016-01-01 No Said so
3 beta 12.560 2011-02-02 2012-01-01 No Removed
4 beta 15.670 2017-02-02 2018-01-01 No Said so
5 gamma 56.230 2019-01-01 2018-02-02 No Removed
6 delta 12.890 2019-01-01 <NA> <NA> <NA>
7 epsilon 89.430 2019-01-01 <NA> <NA> <NA>
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句