我的问题是方法之一。我使用SO遍历方法在R中创建3维数组(这是我的第一个问题; R是约束)。用例是该最终数组需要经常更新,但是两个输入数组在不同的时间更新。目标是最大程度地减少最终阵列创建时间,而且还尽可能减少中间步骤。
我知道我可以使用Rcpp,而且为了提高可读性,我分配了比我所需更多的资源,但是我想知道的是:
是否有更好的方法来完成此操作?
if (!require("geosphere")) install.packages("geosphere")
#simulate real data
dimLength <- 418
latLong <- cbind(rep(40,418),rep(2,418))
potentialChurn <- as.matrix(rep(500,418))
#create 2D matrix
valueMat <- matrix(0,dimLength,dimLength)
value <- potentialChurn
valueTranspose <- t(value)
for (s in 1:dimLength){valueMat[s,] <- value + valueTranspose[s]}
diag(valueMat) <- 0
#create 3D matrix from copying 2D matrix
bigValMat <- array(0,dim=c(dimLength,dimLength,dimLength))
for (d in 1:dimLength){bigValMat[,d,] <- valueMat}
#get crow fly distance between locations, create 2D matrix
distMat <- as.matrix(outer(seq(dimLength), seq(dimLength), Vectorize(function(i, j) distCosine(latLong[i,], latLong [j,]))))
###create 3D matrix by calculating distance between any two locations;
# create 2D matrix from each column in original 2D matrix
# add this column-replicated 2D matrix to the original
bigDistMat <- array(0,dim=c(dimLength,dimLength,dimLength))
for (p in 1:dimLength){
addCol <- distMat[,p]
addMatrix <- as.matrix(addCol)
for (y in 2:dimLength) {addMatrix <- cbind(addMatrix,addCol)}
bigDistMat[,p,] <- data.matrix(distMat) + data.matrix(addMatrix)}
#Final matrix calculation
bigValDistMat <- bigValMat / bigDistMat
...作为背景,这是针对使用巴塞罗那单车共享(Bicing)数据针对某类学生制定的提前两步预测政策的一部分。项目结束了,我对如何做得更好很感兴趣。
通常,如果您想加快代码执行速度,则希望识别瓶颈并按照此处的说明进行修复。将您所有的代码放在一个函数中是一个好主意。
在您的特定情况下,R代码的for循环使用过多。您需要更多地向量化您的代码。
立即编辑以获得长答案:
#simulate real data, you want them to be random
dimLength <- 418
latLong <- cbind(rnorm(dimLength,40,0.5),rnorm(dimLength,2,0.5))
potentialChurn <- as.matrix(rnorm(dimLength,500,10))
#create 2D matrix, outer is designed for this operation
valueMat <- outer(value,t(value),FUN="+")[,1,1,]
diag(valueMat) <- 0
# create 3D matrix from copying 2D matrix, again, avoid for loop
bigValMat <- array(rep(valueMat,dimLength),dim=c(dimLength,dimLength,dimLength))
# and use aperm to permute the dimensions
bigValMat <- aperm(bigValMat2,c(1,3,2))
#get crow fly distance between locations, create 2D matrix
# other packages are available to compute that kind of distance matrix
# but let's stay in plain R
# wordy but so much faster (and easier to read)
longs1 <- rep(latLong[,1],dimLength)
lats1 <- rep(latLong[,2],dimLength)
latLong1 <- cbind(longs1,lats1)
longs2 <- rep(latLong[,1],each=dimLength)
lats2 <- rep(latLong[,2],each=dimLength)
latLong2 <- cbind(longs2,lats2)
distMat <- matrix(distCosine(latLong1,latLong2),ncol=dimLength)
###create 3D matrix by calculating distance between any two locations;
# same logic than for bigValMat
addMatrix <- array(rep(distMat,dimLength),dim=rep(dimLength,3))
distMat3D <- aperm(addMatrix,c(1,3,2))
bigDistMat <- addMatrix + distMat3D
#get crow fly distance between locations, create 2D matrix
#Final matrix calculation
bigValDistMat <- bigValMat / bigDistMat
在这里,它比初始代码(76s-> 3s)快25倍。仍然可以进行很多改进,但是您有一个主意:不惜一切代价避免for
和cbind
and co。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句