R aggregate based on multiple columns and then merge into dataframe?

Soran

I have a dataframe that looks like:

id<-c(1,1,1,3,3)
date1<-c("23-01-08","01-11-07","30-11-07","17-12-07","12-12-08")
type<-c("A","B","A","B","B")
df<-data.frame(id,date,type)
df$date<-as.Date(as.character(df$date), format = "%d-%m-%y")

What I want is to add a new column that contains the earliest date for each ID for each type. This first attempt works fine and does the aggregate and merging based on only the ID.

d = aggregate(df$date, by=list(df$id), min)
df2 = merge(df, d, by.x="id", by.y="Group.1")

What I want though is to also filter by type and get this result:

data.frame(df2, desired=c("2007-11-30","2007-11-01", "2007-11-30","2007-12-17","2007-12-17"))

I've tried a lot of possibilities. I really think it can be done with lists but I'm at a loss to how...

d = aggregate(df$date, by=list(df$id, df$type), min)

# And merge the result of aggregate with the original data frame
df2 = merge(df,d,by.x=list("id","type"),by.y=list("Group.1","Group.2"))

For this simple example I could just separate the types into their own df, build the new column and then combine the resulting 2 dfs but in reality there's many types and a 3rd column that also has to be filtered similarly which would not be practical...

Thank you!

akrun

We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'id', 'type' (or with 'id'), order the 'date' and assign (:=) the first element of 'date' as the 'earliestdate' column.

library(data.table)
setDT(df)[order(date), earliestdateid := date[1], by = id
    ][order(date), earliestdateidtype := date[1], by = .(id, type)]
df
#    id       date type earliestdateid earliestdateidtype
#1:  1 2008-01-23    A     2007-11-01         2007-11-30
#2:  1 2007-11-01    B     2007-11-01         2007-11-01
#3:  1 2007-11-30    A     2007-11-01         2007-11-30
#4:  3 2007-12-17    B     2007-12-17         2007-12-17
#5:  3 2008-12-12    B     2007-12-17         2007-12-17

A similar approach with dplyr is

library(dplyr)
df %>%
   group_by(id) %>%
   arrange(date) %>%
   mutate(earliestdateid = first(date)) %>%
   group_by(type, add = TRUE) %>%
   mutate(earliestdateidtype = first(date))

NOTE: This avoid doing this in two steps i.e. get a summarised output and then join

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Aggregate Pandas DataFrame based on condition that uses multiple columns?

From Dev

Merge Multiple Duplicate rows based on multiple columns in Pandas.Dataframe

From Dev

r aggregate max when by parameter is based on multiple columns

From Dev

Merging multiple columns in a dataframe based on condition in R

From Dev

R Aggregate over multiple columns

From Dev

Merge multiple DataFrame columns into one

From Dev

Split a dataframe column in multiple columns based on multiple occurrences of a separator in R

From Dev

how to aggregate multiple columns of a dataframe with dplyr

From Dev

Merge columns based on condition in R

From Dev

R: Using the sort function in a dataframe based on multiple columns

From Dev

r - Adding a row index based on a combination of multiple columns in a large dataframe

From Dev

Merge dataframe based on column in R

From Dev

Aggregate multiple rows of the same data.frame in R based on common values in given columns

From Dev

Aggregate multiple rows in R based on common values in given columns by column indices

From Dev

pandas merge dataframe based on same value in columns

From Dev

aggregate through across multiple columns in R

From Dev

Aggregate multiple columns by values in another column in R

From Dev

Merge multiple columns based on the first column values

From Dev

Split a dataframe in multiple columns in R

From Dev

Merge columns within a dataframe that is very wide in R

From Dev

Aggregate multiple columns based on specific date range with in a month

From Dev

How to Merge Multiple Columns in to Two Columns based on Column 1 Value?

From Dev

Pandas merge single column dataframe with another dataframe of multiple columns

From Dev

Filter pandas dataframe based on values in multiple columns

From Dev

Melt multiple columns pandas dataframe based on criteria

From Dev

pandas dataframe column based on row and multiple columns

From Dev

Combination of merge and aggregate in R

From Dev

Aggregate multiple columns at once

From Dev

SQLite: Multiple aggregate columns

Related Related

  1. 1

    Aggregate Pandas DataFrame based on condition that uses multiple columns?

  2. 2

    Merge Multiple Duplicate rows based on multiple columns in Pandas.Dataframe

  3. 3

    r aggregate max when by parameter is based on multiple columns

  4. 4

    Merging multiple columns in a dataframe based on condition in R

  5. 5

    R Aggregate over multiple columns

  6. 6

    Merge multiple DataFrame columns into one

  7. 7

    Split a dataframe column in multiple columns based on multiple occurrences of a separator in R

  8. 8

    how to aggregate multiple columns of a dataframe with dplyr

  9. 9

    Merge columns based on condition in R

  10. 10

    R: Using the sort function in a dataframe based on multiple columns

  11. 11

    r - Adding a row index based on a combination of multiple columns in a large dataframe

  12. 12

    Merge dataframe based on column in R

  13. 13

    Aggregate multiple rows of the same data.frame in R based on common values in given columns

  14. 14

    Aggregate multiple rows in R based on common values in given columns by column indices

  15. 15

    pandas merge dataframe based on same value in columns

  16. 16

    aggregate through across multiple columns in R

  17. 17

    Aggregate multiple columns by values in another column in R

  18. 18

    Merge multiple columns based on the first column values

  19. 19

    Split a dataframe in multiple columns in R

  20. 20

    Merge columns within a dataframe that is very wide in R

  21. 21

    Aggregate multiple columns based on specific date range with in a month

  22. 22

    How to Merge Multiple Columns in to Two Columns based on Column 1 Value?

  23. 23

    Pandas merge single column dataframe with another dataframe of multiple columns

  24. 24

    Filter pandas dataframe based on values in multiple columns

  25. 25

    Melt multiple columns pandas dataframe based on criteria

  26. 26

    pandas dataframe column based on row and multiple columns

  27. 27

    Combination of merge and aggregate in R

  28. 28

    Aggregate multiple columns at once

  29. 29

    SQLite: Multiple aggregate columns

HotTag

Archive