Diff on each subset of a data frame column

debugcn 投稿 Dev

Melissa Salazar

I have a data frame with ID, year, and month. I need to group by year and month and get the unique IDs from that group. I want to compare the unique IDs to the prior year, month group, how many IDs were added and how many were subtracted.

Kind of shooting in the dark but I tried the following, doesn't work:

connections <- df %>%
  group_by(year, month) %>%
  arrange(year, month) %>%
  diff_data(unique(as.vector(~ID)), lag(unique(as.vector(~ID))))

Sample Data

df <- data.frame(ID=c("A1", "A2", "A3", "A1", "A2","A4", "A1", "A4", "A5"),
year= c(2010, 2010, 2010, 2011, 2011, 2011, 2012, 2012, 2012), 
month= c(1, 2, 3, 1, 2, 3, 1, 2, 3))

Desired Output

Ben

First would do aggregate on both month and year. In this approach would list all IDs added and deleted each month, and get length to count how many added and deleted each month.

library(tidyverse)

df %>%
  aggregate(ID ~ year + month, ., unique, drop = FALSE) %>%
  group_by(month) %>%
  arrange(year) %>%
  mutate(addedID = mapply(setdiff, ID, lag(ID), SIMPLIFY = FALSE),
         num_addedID = lapply(addedID, length),
         deletedID = mapply(setdiff, lag(ID), ID, SIMPLIFY = FALSE),
         num_deletedID = lapply(deletedID, function(x) length(na.omit(x)))) %>%
  ungroup() %>%
  arrange(month, year) %>%
  as.data.frame()

Output

  year month ID addedID num_addedID deletedID num_deletedID
1 2010     1 A1      A1           1        NA             0
2 2011     1 A1                   0                       0
3 2012     1 A1                   0                       0
4 2010     2 A3      A3           1        NA             0
5 2011     2 A2      A2           1        A3             1
6 2012     2 A4      A4           1        A2             1
7 2010     3 A3      A3           1        NA             0
8 2011     3 A4      A4           1        A3             1
9 2012     3 A5      A5           1        A4             1

Data

df <- data.frame(ID=c("A1", "A3", "A3", "A1", "A2","A4", "A1", "A4", "A5"),
                 year= c(2010, 2010, 2010, 2011, 2011, 2011, 2012, 2012, 2012), 
                 month= c(1, 2, 3, 1, 2, 3, 1, 2, 3))

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]