我在下面提到了 R 中的数据框:
Unique_ID D_1 ST_1 D_2 ST_2 D_3 ST_3
JJ-123 2018-04-01 No Range 2018-03-12 50-80 2018-02-01 10-30
JJ-113 2018-04-01 50-80 2018-03-05 50-80 2018-02-01 10-30
JJ-457 2018-04-03 10-30 2018-03-12 1-5 2018-02-01 No Range
JJ-879 2018-04-01 No Range 2018-03-12 50-80 2018-02-01 50-80
注意:为了简单起见,我只提到了三个ST_
值,尽管在原始数据框中我有列 until ST_38
。
输入:
structure(list(Unique_ID = c("JJ-123", "JJ-113", "JJ-457", "JJ-879"
), D_1 = c("01/04/2018", "01/04/2018", "03/04/2018", "01/04/2018"
), ST_1 = c("No Range", "50-80", "10-30", "No Range"), D_2 = c("12/03/2018",
"05/03/2018", "12/03/2018", "12/03/2018"), ST_2 = c("50-80",
"50-80", "1-5", "50-80"), D_3 = c("01/02/2018", "01/02/2018",
"01/02/2018", "01/02/2018"), ST_3 = c("10-30", "10-30", "No Range",
"50-80")), class = "data.frame", row.names = c(NA, -4L))
使用上面的数据框,当ST_
值第一次更改为10-30
和时,我想获得最旧的日期50-80
。
输出:
Unique_ID 10-30 50-80
JJ-123 2018-02-01 2018-03-12
JJ-113 2018-02-01 2018-03-05
JJ-457 2018-04-03 NA
JJ-879 NA 2018-02-01
library(tidyr)
library(dplyr)
d %>% gather("variable", "value", -Unique_ID) %>%
separate(variable, c("variable", "number")) %>%
spread(variable, value) %>%
mutate(D = as.Date(D, format="%d/%m/%Y")) %>%
filter(ST %in% c("10-30", "50-80")) %>%
group_by(Unique_ID, ST) %>%
filter(D==min(D)) %>%
select(-number) %>%
spread(ST, D)
## # A tibble: 4 x 3
## # Groups: Unique_ID [4]
## Unique_ID `10-30` `50-80`
## * <chr> <date> <date>
## 1 JJ-113 2018-02-01 2018-03-05
## 2 JJ-123 2018-02-01 2018-03-12
## 3 JJ-457 2018-04-03 NA
## 4 JJ-879 NA 2018-02-01
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句