I have a data frame with ID, year, and month. I need to group by year and month and get the unique IDs from that group. I want to compare the unique IDs to the prior year, month group, how many IDs were added and how many were subtracted.
Kind of shooting in the dark but I tried the following, doesn't work:
connections <- df %>%
group_by(year, month) %>%
arrange(year, month) %>%
diff_data(unique(as.vector(~ID)), lag(unique(as.vector(~ID))))
Sample Data
df <- data.frame(ID=c("A1", "A2", "A3", "A1", "A2","A4", "A1", "A4", "A5"),
year= c(2010, 2010, 2010, 2011, 2011, 2011, 2012, 2012, 2012),
month= c(1, 2, 3, 1, 2, 3, 1, 2, 3))
First would do aggregate
on both month and year. In this approach would list all IDs added and deleted each month, and get length
to count how many added and deleted each month.
library(tidyverse)
df %>%
aggregate(ID ~ year + month, ., unique, drop = FALSE) %>%
group_by(month) %>%
arrange(year) %>%
mutate(addedID = mapply(setdiff, ID, lag(ID), SIMPLIFY = FALSE),
num_addedID = lapply(addedID, length),
deletedID = mapply(setdiff, lag(ID), ID, SIMPLIFY = FALSE),
num_deletedID = lapply(deletedID, function(x) length(na.omit(x)))) %>%
ungroup() %>%
arrange(month, year) %>%
as.data.frame()
Output
year month ID addedID num_addedID deletedID num_deletedID
1 2010 1 A1 A1 1 NA 0
2 2011 1 A1 0 0
3 2012 1 A1 0 0
4 2010 2 A3 A3 1 NA 0
5 2011 2 A2 A2 1 A3 1
6 2012 2 A4 A4 1 A2 1
7 2010 3 A3 A3 1 NA 0
8 2011 3 A4 A4 1 A3 1
9 2012 3 A5 A5 1 A4 1
Data
df <- data.frame(ID=c("A1", "A3", "A3", "A1", "A2","A4", "A1", "A4", "A5"),
year= c(2010, 2010, 2010, 2011, 2011, 2011, 2012, 2012, 2012),
month= c(1, 2, 3, 1, 2, 3, 1, 2, 3))
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加