我正在寻找一种确定data.table
下面每个子组的父组的方法。
Group SubGroup Level Parent
1: A A1 0 NA
2: A A2 1 A1
3: A A3 1 A1
4: A A4 2 A3
5: A A5 3 A4
6: A A6 3 A4
7: A A7 3 A4
8: A A8 2 A3
9: A A9 2 A3
10: A A10 2 A3
这是我正在使用的计算,但是我想知道是否有更好的方法。我的实际数据集包括多个组,因此我也想by=
在计算中添加一个参数。可以假设父级是最大索引小于当前行且级别小于当前级别的子组。
tmp = data.table(Group = "A", SubGroup = paste0("A", 1:10),
Level = c(0, 1, 1, 2, 3, 3, 3, 2, 2, 2))
tmp[, Parent := sapply(1:nrow(tmp), function(x)
tmp[, SubGroup[(suppressWarnings(max(which(Level[1:x] < Level[x]))))]])]
dt = data.table(Group = "A", SubGroup = paste0("A", 1:11),
Level = c(0, 1, 1, 2, 3, 3, 3, 2, 2, 2, 3))
# need another grouping layer, to satisfy the row requirements
dt[, rowGroup := cumsum(c(0, diff(Level) != 0)), by = Group]
# get the parent for each Level and rowGroup
parents = dt[, .(Level = Level[.N] + 1, Parent = SubGroup[.N]), by = .(Group, rowGroup)]
setkey(parents, Group, Level, rowGroup)
setkey(dt, Group, Level, rowGroup)
# rolling merge that matches to previous rowGroup
parents[dt, roll = T][order(Group, rowGroup)]
# Group rowGroup Level Parent SubGroup
# 1: A 0 0 NA A1
# 2: A 1 1 A1 A2
# 3: A 1 1 A1 A3
# 4: A 2 2 A3 A4
# 5: A 3 3 A4 A5
# 6: A 3 3 A4 A6
# 7: A 3 3 A4 A7
# 8: A 4 2 A3 A8
# 9: A 4 2 A3 A9
#10: A 4 2 A3 A10
#11: A 5 3 A10 A11
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句