我有一个看起来像这样的数据框:
# A tibble: 9 x 5
# Groups: group [3]
group year value1 value2 value3
<int> <dbl> <int> <int> <int>
1 1 2000 NA 3 4
2 1 2001 8 3 4
3 1 2002 4 3 NA
4 2 2000 NA NA 1
5 2 2001 9 NA 1
6 2 2002 1 NA NA
7 3 2000 NA 5 NA
8 3 2001 9 5 NA
9 3 2002 NA 5 NA
我需要一个脚本,该脚本返回每列的第一个和最后一个非na值的年份,而与组无关。理想情况下,输出应如下所示。当心实际的数据集要大得多。
start end
value 1 2001 2002
value 2 2000 2002
value 3 2000 2001
我们可以将其重塑为“长”格式,然后按“名称”分组并summarise
获得“min
和max
”年份
library(dplyr)
library(tidyr)
library(tibble)
df1 %>%
select(-group) %>%
pivot_longer(cols = starts_with('value'), values_drop_na = TRUE) %>%
group_by(name) %>%
summarise(start = min(year), end = max(year)) %>%
column_to_rownames('name')
# start end
#value1 2001 2002
#value2 2000 2002
#value3 2000 2001
或melt
从data.table
library(data.table)
melt(setDT(df1), id.var = c('year', 'group'), na.rm = TRUE)[,
.(start = min(year), end = max(year)), .(variable)]
或者我们也可以利用 summarise_at
df1 %>%
summarise_at(vars(starts_with('value')), ~
list(range(year[!is.na(.)]))) %>%
unnest(everything()) %>%
pivot_longer(everything())
df1 <- structure(list(group = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L),
year = c(2000L, 2001L, 2002L, 2000L, 2001L, 2002L, 2000L,
2001L, 2002L), value1 = c(NA, 8L, 4L, NA, 9L, 1L, NA, 9L,
NA), value2 = c(3L, 3L, 3L, NA, NA, NA, 5L, 5L, 5L), value3 = c(4L,
4L, NA, 1L, 1L, NA, NA, NA, NA)), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9"))
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句