I have a dataframe, DATA
, containing information on students and their test dates. I want to create a variable called WANT
, where for all the STUDENT
s you count the unique months (not the unique rows) as shown in the sample of the WANT variable below:
library(dplyr)
set.seed(0)
DATA <- data.frame("STUDENT" = sample(1:5, 100, r = T),
"TESTDATE" = sample(seq(as.Date('2010/01/01'), as.Date('2010/12/31'), by="day"), 100, r=T))
DATA <- DATA %>% arrange(STUDENT, TESTDATE)
DATA$WANT <- c(1,1,1,2,2,3,3,4,4,5,5,6,7,7,8,8,9,1,1,1,2,3,3,4,5,5,6,7,8,8,9,10,10, rep(NA, 67))
My attempt only does rows and it's not what I wish for
DATA %>% group_by(STUDENT) %>% mutate(WANT = 1:n())
We may extract the month part and use match
or as.integer(factor(WANT2 levels = unique(WANT2)))
library(dplyr)
out <- DATA %>%
group_by(STUDENT) %>%
mutate(WANT2 = as.integer(format(TESTDATE, '%m')),
WANT2 = match(WANT2, unique(WANT2))) %>%
ungroup
-output
> head(out, 22) %>% as.data.frame
STUDENT TESTDATE WANT WANT2
1 1 2010-01-20 1 1
2 1 2010-01-31 1 1
3 1 2010-02-10 2 2
4 1 2010-02-10 2 2
5 1 2010-03-27 3 3
6 1 2010-04-20 4 4
7 1 2010-04-21 4 4
8 1 2010-05-02 5 5
9 1 2010-05-06 5 5
10 1 2010-05-13 5 5
11 1 2010-05-20 5 5
12 1 2010-06-17 6 6
13 1 2010-08-22 7 7
14 1 2010-08-25 7 7
15 1 2010-08-27 7 7
16 1 2010-08-30 7 7
17 1 2010-09-06 8 8
18 1 2010-09-30 8 8
19 1 2010-10-27 9 9
20 1 2010-10-31 9 9
21 1 2010-12-10 10 10
22 1 2010-12-21 10 10
If we want the year-month to count as separate, then do
out <- DATA %>%
group_by(STUDENT) %>%
mutate(WANT2 = format(TESTDATE, '%Y-%m'),
WANT2 = match(WANT2, unique(WANT2))) %>%
ungroup
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments