这段代码运行良好,但速度有点慢。我注意到它只在处理器的一个核心上运行。如果它使用多个核心,它可能会快一点。
### proximity filter
options("scipen"=100)
library(geosphere)
# split up data into regions
splitdt<-split(geocities, geocities$airport_code)
## reduce cities
dat=geocities[FALSE,][]
currentregion=1
while (currentregion <= NROW(splitdt)){
workingregion <- as.data.frame(splitdt[[currentregion]]) ## set region
workingregion$remove = FALSE
setDT(workingregion)
#plot(workingregion$longitude,workingregion$latitude)
currentorigin=1
while (currentorigin <= NROW(workingregion)) {
# choose which row to use
# as the first part of the distance formula
workingorigin <- workingregion[,c("longitude","latitude")] %>% slice(currentorigin) ## set LeadingRow city
setDT(workingorigin)
# calculate the distance from the specific row chosen
# and only keep ones which are further than 20km
workingregion<-workingregion %>% rowwise() %>% mutate(remove =
ifelse(distHaversine(c(longitude, latitude), workingorigin) != 0 & # keep workingorigin city
distHaversine(c(longitude, latitude), workingorigin) < 17000,TRUE,workingregion$remove))
# remove matched cities
workingregion <- workingregion[workingregion$remove!=TRUE,]
currentorigin = currentorigin+1
}
currentregion = currentregion+1
# save results
workingregion <- workingregion[workingregion$remove!=TRUE,]
dat <- rbind(dat, workingregion) #, fill=TRUE
}
我注意到的第一件事是: dat <- rbind(dat, workingregion)
这行代码在循环中动态增长一个向量,这是不建议的并且会很慢。
我知道这不能回答你关于并行化这个循环的问题。然而,我只是通过一个类似的练习来收集 100,000 个 SQL 查询的结果,并通过内存意识将我的代码加速了 60 倍。
我还将我的代码与foreach和%dopar%并行。这是 Windows 的理想选择,并且很容易建立一个集群(每个核心上的 R 实例)。
下面是一个有帮助的例子:
library(parallel)
library(doParallel)
library(snow)
# Uses all but one core
cl = makeCluster(detectCores() - 1)
# Necessary to give your instances of R on each core the necessary tools to do what
# happens in loop
clusterExport(cl, '<variable names>')
clusterEvalQ(cl, library(packages ))
# parallel loop for going through each region (in your case)
foreach(currentregion = splitdt) %dopar% # iterates over splitdt to cores
{
<body of loop>
}
# Shut down cluster
stopCluster(cl)
stopImplicitCluster()
以下是一些有关加速 R 代码的资源:http : //adv-r.had.co.nz/Performance.html(由该人自己编写)https://csgillespie.github.io/efficientR/performance.html
希望这会有所帮助,祝你好运!
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句