以下是我的数据框lanec
:
read.table(textConnection(scan(,character(),sep="\n")))
vehicle.id frame.id svel PrecVehVel
1 2 1 55 59
2 2 2 55 59
3 2 3 53 57
4 2 4 50 54
5 2 5 48 52
6 3 3 49 53
7 3 4 55 59
8 3 5 55 59
9 3 6 43 47
10 3 7 45 49
11 3 8 52 56
12 3 9 50 54
13 4 1 38 42
14 4 2 42 46
15 4 3 45 49
16 4 4 48 52
17 4 5 50 54
18 4 6 52 56
19 4 7 55 59
20 5 6 49 53
21 5 7 52 56
22 5 8 54 58
23 5 9 58 62
24 5 10 60 64
25 5 11 63 67
26 5 12 70 74
<Carriage return>
我想通过每3行但连续的行来找到和cor
之间的相关性(分别是车辆的速度和先前车辆的速度)。这意味着在的数据帧中,R应该首先找到svel
PrecVehVel
vehicle.id
lanec
vehicle.id==2
svel PrecVehVel
1 55 59
2 55 59
3 53 57
svel(55,55,53)和PrecVehVel(59,59,57),然后从第二行重新开始,并找到
svel PrecVehVel
2 55 59
3 53 57
4 50 54
svel(55,53,50)和PrecVehVel(59,57,54)等。
输出应该是这样的:
vehicle.id frames speed.cor
2 1 - 3 1
2 2 - 4 1
2 3 - 5 1
2 4 - 5 1
请注意,frames
列中的最后一个条目只有2个帧可以找到相关性,因为没有车辆2的更多数据。我对R的有限了解可以做到的最好是:
ddply(lanec, 'vehicle.id', summarize, speed.cor = cor(svel, PrecVehVel) )
但这显然不符合目标,因为它找到了vehicle.id的所有行的相关性。
这是一个棘手的问题。我发现对于这些“滚动计算”类型的问题,大多数基本R解决方案对于任何大小的数据来说都太慢了。通过使用该data.table
程序包,我有很多运气(尤其是速度问题)。我提供了该parallel
软件包,以防万一您有大量观察结果,并且需要更快地执行此操作。它设置为mc.cores=1
现在,但是如果您运行的是Mac或Linux,则显然可以对其进行设置。
lanec <- structure(list(vehicle.id = c(2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L,
5L, 5L), frame.id = c(1L, 2L, 3L, 4L, 5L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 6L, 7L, 8L, 9L, 10L, 11L,
12L), svel = c(55, 75, 53, 50, 32, 49, 55, 55, 43, 45, 52, 50,
38, 42, 45, 48, 50, 52, 55, 49, 52, 54, 58, 60, 63, 70), PrecVehVel = c(59,
59, 57, 54, 52, 53, 59, 59, 47, 49, 56, 54, 42, 46, 49, 52, 54,
56, 59, 53, 56, 58, 62, 64, 67, 74)), .Names = c("vehicle.id",
"frame.id", "svel", "PrecVehVel"), class = "data.frame", row.names = c(NA,
-26L))
#Load data.table package
require("data.table")
require("parallel")
data <- data.table(lanec)
#What length of correlation vector do you want?
cor.vec <- 2
##Assign each customer an ID
data[,ID:=.GRP,by=c("vehicle.id")]
##Group values at the list level
Ref <- data[,list(frame.id=list(I(frame.id)),svel.list=list(I(svel)),PrecVehVel.list=list(I(PrecVehVel))),by=list(vehicle.id)]
#Calculate rolling calculation
data$Roll.Corr <- mcmapply(FUN = function(RD, NUM) {
#mcmapply is a multicore version of mapply. If running Linux or Mac, you can up the amount of cores and have the code run faster
#d is the difference between the current frame.id and all other frame.id's within the same vehicle id.
d <- (Ref$frame.id[[NUM]] - RD)
#The following checks whether d is within the "window" you want. If not in the desired "window", then svel1 and prec1 will have zero values. If in desired "window", then its value will be the respective "svel" and "prec" value in original data.
svel1 <- (d >= 0 & d <= cor.vec)*Ref$svel.list[[NUM]]
prec1 <- (d >= 0 & d <= cor.vec)*Ref$PrecVehVel.list[[NUM]]
#Following discards all data points not in sliding "window" (deletes all of the zeros)
keep <- which(d >= 0 & d <= cor.vec)
svel1 <- svel1[keep]
prec1 <- prec1[keep]
#Following makes sure a correlation value is only provided if the number of points within the window is larger than the correlation "window" length
if (length(svel1)>cor.vec){
cor(svel1,prec1)
} else {
NA
}
}, RD = data$frame.id,NUM=data$ID,mc.cores=1)
#Print data
data[,frame.start:=ifelse(is.na(Roll.Corr),NA,frame.id)]
data[,frame.end:=ifelse(is.na(Roll.Corr),NA,frame.id+cor.vec)]
head(data,10)
vehicle.id frame.id svel PrecVehVel ID Roll.Corr frame.start frame.end
1: 2 1 55 59 1 0.5694948 1 3
2: 2 2 75 59 1 0.8635894 2 4
3: 2 3 53 57 1 0.8746393 3 5
4: 2 4 50 54 1 NA NA NA
5: 2 5 32 52 1 NA NA NA
6: 3 3 49 53 2 1.0000000 3 5
7: 3 4 55 59 2 1.0000000 4 6
8: 3 5 55 59 2 1.0000000 5 7
9: 3 6 43 47 2 1.0000000 6 8
10: 3 7 45 49 2 1.0000000 7 9
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句