我试图在同一data.table中的某个时间范围内找到平均值,然后将其保存到新列中。
以下是样本数据集
> x
ts value avg
1: 2015-01-01 00:00:23 9 0
2: 2015-01-01 00:01:56 11 0
3: 2015-01-01 00:02:03 18 0
4: 2015-01-01 00:03:16 1 0
5: 2015-01-01 00:05:19 6 0
6: 2015-01-01 00:05:54 16 0
7: 2015-01-01 00:06:27 13 0
8: 2015-01-01 00:06:50 7 0
9: 2015-01-01 00:08:41 12 0
10: 2015-01-01 00:09:08 17 0
11: 2015-01-01 00:09:28 8 0
12: 2015-01-01 00:10:56 5 0
13: 2015-01-01 00:11:44 10 0
14: 2015-01-01 00:12:23 20 0
15: 2015-01-01 00:12:28 2 0
16: 2015-01-01 00:12:37 15 0
17: 2015-01-01 00:12:42 4 0
18: 2015-01-01 00:12:48 19 0
19: 2015-01-01 00:13:41 3 0
20: 2015-01-01 00:16:04 14 0
我的代码将值10.5分配给所有行,但没有得到预期的结果。这是我的代码。
require(lubridate)
x[, avg := x[ts>=ts-minutes(2) & ts<=ts , mean(value)], verbose=TRUE ]
我希望结果如下
ts value avg
1 01-01-2015 00:00:23 9 0
2 01-01-2015 00:01:56 11 9
3 01-01-2015 00:02:03 18 10
4 01-01-2015 00:03:16 1 14.5
5 01-01-2015 00:05:19 6 0
6 01-01-2015 00:05:54 16 6
7 01-01-2015 00:06:27 13 11
8 01-01-2015 00:06:50 7 11.66666667
9 01-01-2015 00:08:41 12 7
10 01-01-2015 00:09:08 17 12
11 01-01-2015 00:09:28 8 14.5
12 01-01-2015 00:10:56 5 12.5
13 01-01-2015 00:11:44 10 5
14 01-01-2015 00:12:23 20 7.5
15 01-01-2015 00:12:28 2 11.66666667
16 01-01-2015 00:12:37 15 9.25
17 01-01-2015 00:12:42 4 10.4
18 01-01-2015 00:12:48 19 9.333333333
19 01-01-2015 00:13:41 3 11.666667
20 01-01-2015 00:16:04 14 0
我想对具有更大数据集的数据执行此操作,同时还要在单独的列中分别使用最小值和最大值(此处仅显示了平均值函数)。任何帮助将是巨大的。
下面是可复制的代码。
#reproducible code
ts<- seq(from=ISOdatetime(2015,1,1,0,0,0,tz="GMT"),to=ISOdatetime(2015,1,1,0,0,19,tz="GMT"), by="sec")
set.seed(2)
ts <-ts + seconds(round(runif(20,0,1000),0))
value <- 1:20
avg <- 0
x <- data.table(ts,value,avg)
setkey(x,ts)
x
感谢@Saksham使我努力实现应用功能。这是我想出的解决方案。
find <- function(y){
mean(x[ts>=y-minutes(2) & ts<y,value])
}
x$avg <- mapply(find,x[,ts])
> x
ts value avg
1: 2015-01-01 00:00:23 9 NaN
2: 2015-01-01 00:01:56 11 9.000000
3: 2015-01-01 00:02:03 18 10.000000
4: 2015-01-01 00:03:16 1 14.500000
5: 2015-01-01 00:05:19 6 NaN
6: 2015-01-01 00:05:54 16 6.000000
7: 2015-01-01 00:06:27 13 11.000000
8: 2015-01-01 00:06:50 7 11.666667
9: 2015-01-01 00:08:41 12 7.000000
10: 2015-01-01 00:09:08 17 12.000000
11: 2015-01-01 00:09:28 8 14.500000
12: 2015-01-01 00:10:56 5 12.500000
13: 2015-01-01 00:11:44 10 5.000000
14: 2015-01-01 00:12:23 20 7.500000
15: 2015-01-01 00:12:28 2 11.666667
16: 2015-01-01 00:12:37 15 9.250000
17: 2015-01-01 00:12:42 4 10.400000
18: 2015-01-01 00:12:48 19 9.333333
19: 2015-01-01 00:13:41 3 11.666667
20: 2015-01-01 00:16:04 14 NaN
这会做吗
ts[,avg] <- ts[,val] - 0.5
从逻辑上看,看到您的预期结果,它在做相同的事情。如果我误解了,您可以编辑期望的结果以使其更灵活。
编辑:
这种基本的R方法应该可以解决问题。由于我对操纵时间不熟悉,我假设算术的工作方式与大多数语言相同
interval <- minutes(2) #Assuming this is how we define 5 minutes
x$avg <- apply( x, 1, function(y){
mean(x$value[x$time > ( y["time"]) - interval ) && x$time < y["time"]])
})
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句