我具有data.table格式的以下数据结构:
ID Cycle Cycle_Day Cycle_Date Positive_Test_Date
1 1 1 3/28/2019 NA
1 1 2 3/29/2019 NA
1 1 3 3/30/2019 NA
1 1 NA NA 3/29/2019
1 2 1 4/23/2019 NA
1 2 2 4/24/2019 NA
1 2 3 4/25/2019 NA
1 2 NA NA 4/25/2019
2 1 1 3/18/2019 NA
2 1 2 3/19/2019 NA
2 1 3 3/20/2019 NA
2 1 NA NA 3/18/2019
2 2 1 4/23/2019 NA
2 2 2 4/24/2019 NA
2 2 3 4/25/2019 NA
2 2 NA NA 4/24/2019
我想创建一个新列“ LH_Date”,它将为每个ID和每个周期复制事件Cycle_Date和Positive_Test_Date匹配中的日期。否则,值为NA。这是它的外观:
ID Cycle Cycle_Day Cycle_Date Positive_Test_Date LH_Date
1 1 1 3/28/2019 NA NA
1 1 2 3/29/2019 NA 3/29/2019
1 1 3 3/30/2019 NA NA
1 1 NA NA 3/29/2019 NA
1 2 1 4/23/2019 NA NA
1 2 2 4/24/2019 NA NA
1 2 3 4/25/2019 NA 4/25/2019
1 2 NA NA 4/25/2019 NA
2 1 1 3/18/2019 NA 3/18/2019
2 1 2 3/19/2019 NA NA
2 1 3 3/20/2019 NA NA
2 1 NA NA 3/18/2019 NA
2 2 1 4/23/2019 NA NA
2 2 2 4/24/2019 NA 4/24/2019
2 2 3 4/25/2019 NA NA
2 2 NA NA 4/24/2019 NA
另一个选择是使用索引查找符合条件的行,并仅更新这些行:
#for each group of ID and Cycle,
#find the row indices where Cycle_Date equals the last Positive_Test_Date
idxDT <- DT[, .I[Cycle_Date==Positive_Test_Date[.N]], .(ID, Cycle)]
#for those row indices, set the LH_Date to be Cycle_Date
#(NA rows or excluded rows defaults to NA by design in data.table)
DT[idxDT$V1, LH_Date := Cycle_Date]
idxDT
看起来像这样并idxDT$V1
提取列V1
:
ID Cycle V1
1: 1 1 2
2: 1 1 NA
3: 1 2 7
4: 1 2 NA
5: 2 1 9
6: 2 1 NA
7: 2 2 14
8: 2 2 NA
.I
在data.table中包含行索引。来自?.I
:
.I是一个等于seq_len(nrow(x))的整数向量。分组时,它保留组中每个项目的行位置(x)。这对于j中的子集很有用;例如DT [,.I [which.max(somecol)],by = grp]。
输出:
ID Cycle Cycle_Day Cycle_Date Positive_Test_Date LH_Date
1: 1 1 1 3/28/2019 <NA> <NA>
2: 1 1 2 3/29/2019 <NA> 3/29/2019
3: 1 1 3 3/30/2019 <NA> <NA>
4: 1 1 NA <NA> 3/29/2019 <NA>
5: 1 2 1 4/23/2019 <NA> <NA>
6: 1 2 2 4/24/2019 <NA> <NA>
7: 1 2 3 4/25/2019 <NA> 4/25/2019
8: 1 2 NA <NA> 4/25/2019 <NA>
9: 2 1 1 3/18/2019 <NA> 3/18/2019
10: 2 1 2 3/19/2019 <NA> <NA>
11: 2 1 3 3/20/2019 <NA> <NA>
12: 2 1 NA <NA> 3/18/2019 <NA>
13: 2 2 1 4/23/2019 <NA> <NA>
14: 2 2 2 4/24/2019 <NA> 4/24/2019
15: 2 2 3 4/25/2019 <NA> <NA>
16: 2 2 NA <NA> 4/24/2019 <NA>
数据:
library(data.table)
DT <- fread("ID Cycle Cycle_Day Cycle_Date Positive_Test_Date
1 1 1 3/28/2019 NA
1 1 2 3/29/2019 NA
1 1 3 3/30/2019 NA
1 1 NA NA 3/29/2019
1 2 1 4/23/2019 NA
1 2 2 4/24/2019 NA
1 2 3 4/25/2019 NA
1 2 NA NA 4/25/2019
2 1 1 3/18/2019 NA
2 1 2 3/19/2019 NA
2 1 3 3/20/2019 NA
2 1 NA NA 3/18/2019
2 2 1 4/23/2019 NA
2 2 2 4/24/2019 NA
2 2 3 4/25/2019 NA
2 2 NA NA 4/24/2019")
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句