I have a dataframe that looks like:
id<-c(1,1,1,3,3)
date1<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08")
type<-c("A","B","A","B","B")
df<-data.frame(id,date,type)
df$date<-as.Date(as.character(df$date), format = "%d-%m-%y")
What I want is to add a new column that contains the earliest date for each ID for each type. This first attempt works fine and does the aggregate and merging based on only the ID.
d = aggregate(df$date, by=list(df$id), min)
df2 = merge(df, d, by.x="id", by.y="Group.1")
What I want though is to also filter by type and get this result:
data.frame(df2, desired=c("2007-11-30","2007-11-01", "2007-11-30","2007-12-17","2007-12-17"))
I've tried a lot of possibilities. I really think it can be done with lists but I'm at a loss to how...
d = aggregate(df$date, by=list(df$id, df$type), min)
# And merge the result of aggregate with the original data frame
df2 = merge(df,d,by.x=list("id","type"),by.y=list("Group.1","Group.2"))
For this simple example I could just separate the types into their own df, build the new column and then combine the resulting 2 dfs but in reality there's many types and a 3rd column that also has to be filtered similarly which would not be practical...
Thank you!
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'id', 'type' (or with 'id'), order
the 'date' and assign (:=
) the first element of 'date' as the 'earliestdate' column.
library(data.table)
setDT(df)[order(date), earliestdateid := date[1], by = id
][order(date), earliestdateidtype := date[1], by = .(id, type)]
df
# id date type earliestdateid earliestdateidtype
#1: 1 2008-01-23 A 2007-11-01 2007-11-30
#2: 1 2007-11-01 B 2007-11-01 2007-11-01
#3: 1 2007-11-30 A 2007-11-01 2007-11-30
#4: 3 2007-12-17 B 2007-12-17 2007-12-17
#5: 3 2008-12-12 B 2007-12-17 2007-12-17
A similar approach with dplyr
is
library(dplyr)
df %>%
group_by(id) %>%
arrange(date) %>%
mutate(earliestdateid = first(date)) %>%
group_by(type, add = TRUE) %>%
mutate(earliestdateidtype = first(date))
NOTE: This avoid doing this in two steps i.e. get a summarised output and then join
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments