我有一个数据集df:
最终,我希望能够将数据分组为“块”,其中“文件夹”列包含字符串“ Out”,并确保考虑到DATE及其关联的空Message值。有没有一种方法可以为每个实例“ Out”创建一个块,并在计算其持续时间的同时出现一个空的Message行。
Folder DATE Message
Outdata 9/9/2019 5:46:00
Outdata 9/9/2019 5:46:01
Outdata 9/9/2019 5:46:02
In 9/9/2019 5:46:03 hello
In 9/9/2019 5:46:04 hello
Outdata 9/10/2019 6:00:01
Outdata 9/10/2019 6:00:02
In 9/11/2019 7:50:00 hello
In 9/11/2019 7:50:01 hello
我想要这个输出:
New Variable Duration Message
Outdata1 2 sec
Outdata2 1 sec
我已经包含了dput:
dput(sample)
structure(list(Folder = structure(c(2L, 2L, 2L, 1L, 1L, 2L, 2L,
1L, 1L), .Label = c("In", "Outdata"), class = "factor"), Date = structure(c(5L,
6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L), .Label = c("9/10/2019 6:00:01 AM",
"9/10/2019 6:00:02 AM", "9/11/2019 7:50:00 AM", "9/11/2019 7:50:01 AM",
"9/9/2019 5:46:00 AM", "9/9/2019 5:46:01 AM", "9/9/2019 5:46:02 AM",
"9/9/2019 5:46:03 AM", "9/9/2019 5:46:04 AM"), class = "factor"),
Message = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("",
"hello"), class = "factor")), class = "data.frame", row.names = c(NA,
-9L))
这是我尝试过的方法,效果很好,我只需要考虑空Message值的情况。
library(dplyr)
df %>%
mutate(DATE = as.POSIXct(DATE, format = "%m/%d/%Y %I:%M:%S %p"),
gr = cumsum(Folder != lag(Folder, default = TRUE))) %>%
filter(Folder == "Out") %>%
arrange(gr, DATE) %>%
group_by(gr) %>%
summarise(Duration = difftime(last(DATE), first(DATE), units = "secs")) %>%
mutate(gr = paste0('Out', row_number()))
上面的代码可以正常工作,但是我不确定如何满足row ==“”的条件
也许,只是paste
在Message
一个串起来。
library(dplyr)
sample %>%
mutate(DATE = as.POSIXct(Date, format = "%m/%d/%Y %I:%M:%S %p"),
gr = cumsum(Folder != lag(Folder, default = TRUE))) %>%
filter(Folder == "Outdata") %>%
arrange(gr, DATE) %>%
group_by(gr) %>%
summarise(Duration = difftime(last(DATE), first(DATE), units = "secs"),
Message = paste0(Message, collapse = "")) %>%
mutate(gr = paste0('Out', row_number()))
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句