Let's say that we have this dataframe:
x<- as.data.frame(cbind(c("A","A","A","B","B","B","B","B","C","C","C","C","C","D","D","D","D","D"),
c(1,2,3,1,2,3,2,3,1,2,3,4,5,1,2,3,4,5),
c(10,12.5,15,2,3.4,5.7,8,9.5,1,5.6,8.9,10,11,2,3.4,6,8,10.5),
c(1,3,4,1,2,3,4,3,2,2,3,5,2,3,5,4,5,5)))
colnames(x)<- c("ID", "Visit", "Time", "State")
Column ID
indicates subject ID.
Column Visit
indicates a series of visits
Column Time
indicates the time that has passed to reach a certain "State"
Column State
indicates severity of a certain disease, where 5 means death. That means that you can fluctuate from worse states to better states, but you can never improve from category 5, since you are dead.
I would like to identify only those subjects that improved from category 5 to a better one, since these are errors from the dataframe (i.e. rows 13 and 16).
Additionally, I would like to remove those rows where a subject seems to have died more than once (i.e. row 18).
I made a similar question before, but it was very general and it implied that all fluctuations to a better state were removed from the dataset, which it is not what I actually want.
The OP has modified the question substantially by requesting that all rows are considered erroneous which appear after the first occurrence of State 5 (death). This includes false recoveries (as in rows 13 and 16) as well as "duplicated deaths" (as in rows 17 and 18).
An answer to this requires a complete different approach. One possibility is to use a non-equi join:
library(data.table)
setDT(x)[x[, first(Visit[State == 5]), by = ID], on = .(ID, Visit > V1), error := TRUE][]
ID Visit Time State error 1: A 1 10.0 1 NA 2: A 2 12.5 3 NA 3: A 3 15.0 4 NA 4: B 1 2.0 1 NA 5: B 2 3.4 2 NA 6: B 3 5.7 3 NA 7: B 2 8.0 4 NA 8: B 3 9.5 3 NA 9: C 1 1.0 2 NA 10: C 2 5.6 2 NA 11: C 3 8.9 3 NA 12: C 4 10.0 5 NA 13: C 5 11.0 2 TRUE 14: D 1 2.0 3 NA 15: D 2 3.4 5 NA 16: D 3 6.0 4 TRUE 17: D 4 8.0 5 TRUE 18: D 5 10.5 5 TRUE
The number of the first visit with State 5 is returned by
x[, first(Visit[State == 5]), by = ID]
ID V1 1: C 4 2: D 2
In the subsequent non-equi join only those rows are marked which appear after the first State 5 event.
x <- data.frame(
ID = c("A","A","A","B","B","B","B","B","C","C","C","C","C","D","D","D","D","D"),
Visit = c(1,2,3,1,2,3,2,3,1,2,3,4,5,1,2,3,4,5),
Time = c(10,12.5,15,2,3.4,5.7,8,9.5,1,5.6,8.9,10,11,2,3.4,6,8,10.5),
State = c(1,3,4,1,2,3,4,3,2,2,3,5,2,3,5,4,5,5))
この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。
侵害の場合は、連絡してください[email protected]
コメントを追加