总结data.table-在R中按日期创建多个列子集

debugcn 发表于 Dev

内卡

我有有关多年的数据ID以及相应的数据amount。像这样：

 ID <- c(rep("A", 5), rep("B", 7), rep("C", 3))
  amount <- c(sample(1:10000, 15))
  Date <- c("2016-01-22","2016-07-25", "2016-09-22", "2017-10-22", "2017-01-02",
              "2016-08-22", "2016-09-22", "2016-10-22", "2017-08-22", "2017-09-22", "2017-10-22", "2018-08-22", 
              "2016-10-22","2017-10-25", "2018-10-22")

现在，我想分析每一年的每一年ID。具体来说，我对感兴趣amount。首先，我想知道每年的总额。然后，我还想知道每年前11个月，每年前10个月，每年前9个月和每年前8个月的总金额。为此，我cumSum对IDper的计算year如下：

  myData <- cbind(ID, amount, Date)
  myData <- as.data.table(myData)

  # createe cumsum per ID per Year
  myData$Date <- as.Date(myData$Date, format = "%Y-%m-%d")
  myData[order(clientID, clDate)]
  myData[, CumSum := cumsum(amount), by =.(ID, year(Date))]

如何总结data.table这样，我得到列amount9month，amount10month，amount11month为每年每一个ID？

马特

之间cumsum，by而dcast这几乎是非常简单的。最困难的一点是处理那些没有任何数据的月份。因此，该解决方案虽然不像以前那样简短，但是它以“ data.table方式”执行操作，并避免了诸如循环遍历行之类的缓慢操作。

# Just sort the formatting out first
myData[, Date:=as.Date(Date)]
myData[, `:=`(amount = as.numeric(amount),
              year = year(Date),
              month = month(Date))]
bycols <- c('ID', 'year', 'month')

# Summarise all transactions for the same ID in the same month
summary <- myData[, .(amt = sum(amount)), by=bycols]

# Create a skeleton table with all possible combinations of ID, year and month, to fill in any gaps.
skeleton <- myData[, CJ(ID, year, month = 1:12, unique = TRUE)]

# Join the skeleton to the actual data, to recreate the data but with no gaps in
result.long <- summary[skeleton, on=bycols, allow.cartesian=TRUE]
result.long[, amt.cum:=cumsum(fcoalesce(amt, 0)), by=c('ID', 'year')]

# Cast the data into wide format to have one column per month
result.wide <- dcast(result.long, ID + year ~ paste0('amount',month,'month'), value.var='amt.cum')

注意如果没有fcoalesce，请更新您的data.table软件包。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-04-2

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

总结data.table-在R中按日期创建多个列子集

总结data.table-在R中按日期创建多个列子集

选择data.table R中的列子集

R data.table根据列表中的参数将函数应用于列子集

如何使用data.table按日期（月，年，日）和子组汇总结果

Data.table：创建新变量，按日期周期子集，按第二个变量x分组，并按每个x的最终日期汇总结果

更新data.table中列子集的类

使用.SD的data.table中的列子集的累积总和

更新data.table中列子集的类

子集data.table由R中的日期范围设置

data.table R中的子集ID和日期

R：子集上的Data.table按值排除

从R中的多个data.table输入创建向量

在多个条件下的R data.table子集。

R中data.table中的快速子集

R中data.table的程序化子集

r中的data.table：使用列索引的子集

使用Data.Table R中的按行操作创建新列

使用data.table的.SDcols参数计算列子集中的逻辑值

R中数据帧的按列子集

R data.table循环子集按因子进行lm（）

R data.table循环子集按因子进行lm（）

ifelse具有在data.table R中创建新变量的多个条件

为data.table r中的多个列创建汇总变量

如何按组拆分data.table并按列中的出现次数使用子集？

使用data.table在R中按行查找

在R data.table中按组分配

检查R data.table中的按行NA总和

R按data.table中的条件分组

在data.table R中按组滚动

使用数据子集时如何在data.table中创建新列并计算中位数