I am trying to sum consecutive values that have the same student_id and are ordered by date.
I have tried using sequence(rle()) but the order by dates is not working.
row student_id date pass/fail streak
1 2 2019-05-24 0 0
2 2 2019-05-25 -1 -1
3 1 2019-05-24 1 2
4 1 2019-05-28 -1 -1
5 2 2019-05-23 1 1
6 1 2019-05-27 1 3
7 2 2019-05-28 -1 -2
8 1 2019-05-23 1 1
The streak column is the desired output, and the dates are not in order in the original data frame which I think is what is the problem.
Using the ordering indexes o
, order the original data frame by student_id
and date
. Now, apply cumsum
by student_id
and pass/fail
to pass/fail
and finally revert to the original order.
library(data.table)
o <- with(DF, order(student_id, date))
transform(DF[o, ],
streak = ave(`pass/fail`, rleid(student_id, `pass/fail`), FUN = cumsum))[order(o), ]
giving:
ow student_id date pass.fail streak
1 1 2 2019-05-24 0 0
2 2 2 2019-05-25 -1 -1
3 3 1 2019-05-24 1 2
4 4 1 2019-05-28 -1 -1
5 5 2 2019-05-23 1 1
6 6 1 2019-05-27 1 3
7 7 2 2019-05-28 -1 -2
8 8 1 2019-05-23 1 1
The input in reproducible form:
Lines <- "ow student_id date pass/fail streak
1 2 2019-05-24 0 0
2 2 2019-05-25 -1 -1
3 1 2019-05-24 1 2
4 1 2019-05-28 -1 -1
5 2 2019-05-23 1 1
6 1 2019-05-27 1 3
7 2 2019-05-28 -1 -2
8 1 2019-05-23 1 1"
DF <- read.table(text = Lines, header = TRUE, check.names = FALSE)
DF$date <- as.Date(DF$date)
DF$streak <- NULL
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments