我有有关多年的数据ID
以及相应的数据amount
。像这样:
ID <- c(rep("A", 5), rep("B", 7), rep("C", 3))
amount <- c(sample(1:10000, 15))
Date <- c("2016-01-22","2016-07-25", "2016-09-22", "2017-10-22", "2017-01-02",
"2016-08-22", "2016-09-22", "2016-10-22", "2017-08-22", "2017-09-22", "2017-10-22", "2018-08-22",
"2016-10-22","2017-10-25", "2018-10-22")
现在,我想分析每一年的每一年ID
。具体来说,我对感兴趣amount
。首先,我想知道每年的总额。然后,我还想知道每年前11个月,每年前10个月,每年前9个月和每年前8个月的总金额。为此,我cumSum
对ID
per的计算year
如下:
myData <- cbind(ID, amount, Date)
myData <- as.data.table(myData)
# createe cumsum per ID per Year
myData$Date <- as.Date(myData$Date, format = "%Y-%m-%d")
myData[order(clientID, clDate)]
myData[, CumSum := cumsum(amount), by =.(ID, year(Date))]
如何总结data.table
这样,我得到列amount9month
,amount10month
,amount11month
为每年每一个ID?
之间cumsum
,by
而dcast
这几乎是非常简单的。最困难的一点是处理那些没有任何数据的月份。因此,该解决方案虽然不像以前那样简短,但是它以“ data.table方式”执行操作,并避免了诸如循环遍历行之类的缓慢操作。
# Just sort the formatting out first
myData[, Date:=as.Date(Date)]
myData[, `:=`(amount = as.numeric(amount),
year = year(Date),
month = month(Date))]
bycols <- c('ID', 'year', 'month')
# Summarise all transactions for the same ID in the same month
summary <- myData[, .(amt = sum(amount)), by=bycols]
# Create a skeleton table with all possible combinations of ID, year and month, to fill in any gaps.
skeleton <- myData[, CJ(ID, year, month = 1:12, unique = TRUE)]
# Join the skeleton to the actual data, to recreate the data but with no gaps in
result.long <- summary[skeleton, on=bycols, allow.cartesian=TRUE]
result.long[, amt.cum:=cumsum(fcoalesce(amt, 0)), by=c('ID', 'year')]
# Cast the data into wide format to have one column per month
result.wide <- dcast(result.long, ID + year ~ paste0('amount',month,'month'), value.var='amt.cum')
注意 如果没有fcoalesce
,请更新您的data.table
软件包。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句