I am fairly new to R (new to this site as well) and trying to understand how to aggregate data across columns in a situation where there is more than 1 identifier (in this case, two: PERSON_ID and PRODUCT_ID).
Please see my example below. To the right of the two identifiers within my data frame are five columns containing weekly sales figures. I need to aggregate the weekly data so that:
1: Week columns with the same name are summed (typically this is something I can easily accomplish in MS Excel using the sumif/sumifs function)
2: Any rows containing the same PERSON_ID and PRODUCT_ID combination are summed as well.
In this particular case, notice that the week of 6/2/2017 appears in more than one column. Meanwhile, PERSON_ID 0003603 appears twice for the same PRODUCT_ID, 3024.
PERSON_ID PRODUCT_ID 6/23/2017 6/16/2017 6/9/2017 6/2/2017 6/2/2017
0003603 3024 10.000 5.000 4.000 3.000 2.000
0003603 3024 1.000 2.000 3.000 8.000 1.000
0007654 2111 8.000 3.000 2.000 1.000 0.000
0008885 3025 0.000 0.000 1.000 3.000 9.000
0950645 3024 6.000 5.000 4.000 3.000 2.000
My actual data frame contains in excess of 1 million records, so an approach using the data.table package would be ideal, as far as I can tell.
Can someone please shed some light on how to solve this particular problem in R?
melt
ing your data (reshaping long) is the way to go. If I understand what you're after correctly, it's simply:
x = fread('PERSON_ID PRODUCT_ID 6/23/2017 6/16/2017 6/9/2017 6/2/2017 6/2/2017
0003603 3024 10.000 5.000 4.000 3.000 2.000
0003603 3024 1.000 2.000 3.000 8.000 1.000
0007654 2111 8.000 3.000 2.000 1.000 0.000
0008885 3025 0.000 0.000 1.000 3.000 9.000
0950645 3024 6.000 5.000 4.000 3.000 2.000',
colClasses = c('character', 'character', rep('numeric', 5L)))
xmlt =
melt(x, id.vars = c('PERSON_ID', 'PRODUCT_ID'),
variable.name = 'week', value.name = 'sales')
xmlt[ , week := as.IDate(week, format = '%m/%d/%Y')]
xmlt[ , .(total_sales = sum(sales)),
keyby = .(PERSON_ID, PRODUCT_ID, week)]
PERSON_ID PRODUCT_ID week total_sales
# 1: 0003603 3024 2017-06-02 14
# 2: 0003603 3024 2017-06-09 7
# 3: 0003603 3024 2017-06-16 7
# 4: 0003603 3024 2017-06-23 11
# 5: 0007654 2111 2017-06-02 1
# 6: 0007654 2111 2017-06-09 2
# 7: 0007654 2111 2017-06-16 3
# 8: 0007654 2111 2017-06-23 8
# 9: 0008885 3025 2017-06-02 12
# 10: 0008885 3025 2017-06-09 1
# 11: 0008885 3025 2017-06-16 0
# 12: 0008885 3025 2017-06-23 0
# 13: 0950645 3024 2017-06-02 5
# 14: 0950645 3024 2017-06-09 4
# 15: 0950645 3024 2017-06-16 5
# 16: 0950645 3024 2017-06-23 6
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments