我想确定重新获得一个userID的时间戳之间的差异。在这里,我只想衡量具有登录和注销状态的用户之间的差异。有些用户还没有注销我们的登录状态。对于他们,我只想将dem标记为NA
:
一些数据:
library(dplyr)
start <- as.POSIXct("2012-01-15")
interval <- 70
end <- start + as.difftime(1, units="days")
tseq<- seq(from=start, by=interval*70, to=end)
employeID <-c("1_e","1_e","2_b","2_b","3_c","3_c","100_c","4_d","4_d","52_f","9_f","9_f","7_u","7_u","10_5","22_2","33_a","33_a")
status<- c("login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","login","logout","logout","login","logout","login")
# put together
data <- data.frame(tseq, employeID, status)
tseq employeID status
#1 2012-01-15 00:00:00 1_e login
#2 2012-01-15 01:21:40 1_e logout
#3 2012-01-15 02:43:20 2_b login
#4 2012-01-15 04:05:00 2_b logout
#5 2012-01-15 05:26:40 3_c login
#6 2012-01-15 06:48:20 3_c logout
#7 2012-01-15 08:10:00 100_c login
#8 2012-01-15 09:31:40 4_d logout
#9 2012-01-15 10:53:20 4_d login
#10 2012-01-15 12:15:00 52_f logout
#11 2012-01-15 13:36:40 9_f login
#12 2012-01-15 14:58:20 9_f logout
#13 2012-01-15 16:20:00 7_u login
#14 2012-01-15 17:41:40 7_u logout
#15 2012-01-15 19:03:20 10_5 logout
#16 2012-01-15 20:25:00 22_2 login
#17 2012-01-15 21:46:40 33_a logout
#18 2012-01-15 23:08:20 33_a login
test<- data %>%
group_by(employeID) %>%
mutate(time.difference = tseq - lag(tseq))
但是,这似乎只会产生一个时间。
这个怎么样。主要是,您看起来好像在使用mutate
时summarise
。另外,我已经将status
列从factor转换为character,并包括一条ifelse
语句,仅接受同时具有“ login”和“ logout”条目的用户:
test <- data %>%
mutate( status = as.character( status ) ) %>%
group_by( employeID ) %>%
summarise( time.difference = ifelse( "login" %in% status && "logout" %in% status,
difftime( tseq[ status == "logout" ], tseq[ status == "login" ] ),
NA )
)
这使:
> head( test )
# A tibble: 6 × 2
employeID time.difference
<fctr> <dbl>
1 1_e 1.361111
2 10_5 NA
3 100_c NA
4 2_b 1.361111
5 22_2 NA
6 3_c 1.361111
正如其他人所建议的那样,您的数据确实包含恒定的时间间隔,因此,只要有相关值,就始终是相同的。我认为您的实际数据看起来有些不同,因此您将获得更多有意义的输出。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句