如何找到R中每n行的相关性？

umair durrani 发表于 Dev

乌迈尔·杜拉尼

以下是我的数据框lanec：

read.table(textConnection(scan(,character(),sep="\n")))
   vehicle.id frame.id svel PrecVehVel
1           2        1   55         59
2           2        2   55         59
3           2        3   53         57
4           2        4   50         54
5           2        5   48         52
6           3        3   49         53
7           3        4   55         59
8           3        5   55         59
9           3        6   43         47
10          3        7   45         49
11          3        8   52         56
12          3        9   50         54
13          4        1   38         42
14          4        2   42         46
15          4        3   45         49
16          4        4   48         52
17          4        5   50         54
18          4        6   52         56
19          4        7   55         59
20          5        6   49         53
21          5        7   52         56
22          5        8   54         58
23          5        9   58         62
24          5       10   60         64
25          5       11   63         67
26          5       12   70         74

<Carriage return>

我想通过每3行但连续的行来找到和cor之间的相关性（分别是车辆的速度和先前车辆的速度）。这意味着在的数据帧中，R应该首先找到svelPrecVehVelvehicle.idlanecvehicle.id==2

   svel PrecVehVel
1    55         59
2    55         59
3    53         57

svel（55,55,53）和PrecVehVel（59,59,57），然后从第二行重新开始，并找到

   svel PrecVehVel
2    55         59
3    53         57
4    50         54

svel（55,53,50）和PrecVehVel（59,57,54）等。

输出应该是这样的：

vehicle.id     frames     speed.cor
2               1 - 3     1
2               2 - 4     1
2               3 - 5     1
2               4 - 5     1

请注意，frames列中的最后一个条目只有2个帧可以找到相关性，因为没有车辆2的更多数据。我对R的有限了解可以做到的最好是：

ddply(lanec, 'vehicle.id', summarize, speed.cor = cor(svel, PrecVehVel) )

但这显然不符合目标，因为它找到了vehicle.id的所有行的相关性。

迈克·加汉

这是一个棘手的问题。我发现对于这些“滚动计算”类型的问题，大多数基本R解决方案对于任何大小的数据来说都太慢了。通过使用该data.table程序包，我有很多运气（尤其是速度问题）。我提供了该parallel软件包，以防万一您有大量观察结果，并且需要更快地执行此操作。它设置为mc.cores=1现在，但是如果您运行的是Mac或Linux，则显然可以对其进行设置。

lanec <- structure(list(vehicle.id = c(2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 
5L, 5L), frame.id = c(1L, 2L, 3L, 4L, 5L, 3L, 4L, 5L, 6L, 7L, 
8L, 9L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 6L, 7L, 8L, 9L, 10L, 11L, 
12L), svel = c(55, 75, 53, 50, 32, 49, 55, 55, 43, 45, 52, 50, 
38, 42, 45, 48, 50, 52, 55, 49, 52, 54, 58, 60, 63, 70), PrecVehVel = c(59, 
59, 57, 54, 52, 53, 59, 59, 47, 49, 56, 54, 42, 46, 49, 52, 54, 
56, 59, 53, 56, 58, 62, 64, 67, 74)), .Names = c("vehicle.id", 
"frame.id", "svel", "PrecVehVel"), class = "data.frame", row.names = c(NA, 
-26L))

#Load data.table package
require("data.table")
require("parallel")
data <- data.table(lanec)

#What length of correlation vector do you want?
cor.vec <- 2

##Assign each customer an ID
data[,ID:=.GRP,by=c("vehicle.id")]

##Group values at the list level
Ref <- data[,list(frame.id=list(I(frame.id)),svel.list=list(I(svel)),PrecVehVel.list=list(I(PrecVehVel))),by=list(vehicle.id)]

#Calculate rolling calculation
data$Roll.Corr <- mcmapply(FUN = function(RD, NUM) {

  #mcmapply is a multicore version of mapply. If running Linux or Mac, you can up the amount of cores and have the code run faster

  #d is the difference between the current frame.id and all other frame.id's within the   same vehicle id.
  d <- (Ref$frame.id[[NUM]] - RD)

  #The following checks whether d is within the "window" you want. If not in the desired "window", then svel1 and prec1 will have zero values. If in desired "window", then its value will be the respective "svel" and "prec" value in original data.
  svel1 <- (d >= 0 & d <= cor.vec)*Ref$svel.list[[NUM]]
  prec1 <- (d >= 0 & d <= cor.vec)*Ref$PrecVehVel.list[[NUM]]

  #Following discards all data points not in sliding "window" (deletes all of the zeros)
  keep <- which(d >= 0 & d <= cor.vec)
  svel1 <- svel1[keep]
  prec1 <- prec1[keep]

  #Following makes sure a correlation value is only provided if the number of points within the window is larger than the correlation "window" length
  if (length(svel1)>cor.vec){
      cor(svel1,prec1) 
    } else {
      NA
    }   
}, RD = data$frame.id,NUM=data$ID,mc.cores=1)

#Print data
data[,frame.start:=ifelse(is.na(Roll.Corr),NA,frame.id)]
data[,frame.end:=ifelse(is.na(Roll.Corr),NA,frame.id+cor.vec)]
head(data,10)

    vehicle.id frame.id svel PrecVehVel ID Roll.Corr frame.start frame.end
 1:          2        1   55         59  1 0.5694948           1         3
 2:          2        2   75         59  1 0.8635894           2         4
 3:          2        3   53         57  1 0.8746393           3         5
 4:          2        4   50         54  1        NA          NA        NA
 5:          2        5   32         52  1        NA          NA        NA
 6:          3        3   49         53  2 1.0000000           3         5
 7:          3        4   55         59  2 1.0000000           4         6
 8:          3        5   55         59  2 1.0000000           5         7
 9:          3        6   43         47  2 1.0000000           6         8
10:          3        7   45         49  2 1.0000000           7         9

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-02-9

我来说两句

0条评论

登录后参与评论

上一篇：如何使用基于Webview的Chrome托管应用保留cookie

来自分类Dev

Related 相关文章

文章