R中循环慢

debugcn 发表于 Dev

夏洛特·拉梅斯（Charlotte Lamesse）

我正在尝试从NCDef（.nc）格式的多个文件中提取数据。我编写的代码可以运行，但是很慢，我希望您提出任何建议！

我的工作代码temp使用RNetCDF提取了一个名为的文件，并将其转换为“长列表”，其中每个变量都具有一维或三维（纬度，经度和磅数）。然后，我一次从每个变量中提取数据（varlist[j]），并将其转换为数据帧。然后，Adply通过三个维度分别对此进行分解。最后一步创建files，使我可以使用cbind和rbind将所有文件放到一个大数据框中。

代码如下：

setwd("C:/Users/User/Box Sync/_PhD/PhD_Research/Albedo/Data_CLM/PFTRuns/2005/")
fname<-"b40.20th.1deg.bdrd.002bc.clm2.h0.2005-"
numlist<-c('01','02','03','04','05','06','07','08','09','10','11','12')
varlist<-c(1,2,4,8,9,21)
varname<-c("lon","lat","pft","pft_wtgcell","pft_wtcol","FSR")
files<-matrix(data=NA, nrow=12, ncol=length(varlist))

`for (i in 12:12) {
  temp<- paste(c(fname, numlist[i],'.nc'), collapse='')
  temp<-read.nc(open.nc(temp))
  temp<-structure(temp, row.names = c(NA, -288), class = "data.frame")
  for (j in 3:length(varlist)) {
    newname<-paste(c("Y2005", numlist[i],".", varname[j]), collapse='')
    if (j<4){
        assign(newname, adply(temp[,varlist[j]], c(1)))}
    else{
        assign(newname, adply(temp[,varlist[j]], c(1,2,3)))}
    files[i,j]<-newname}}`

编辑这是read.nc（open.nc（））输出的示例。示例输出

戴夫2e

看着你的问题。您有一系列3d数组，您希望将它们展平为1d数组并与坐标矢量（lon，lat和put）对齐。
尽管软件包plyr具有许多非常有用的功能，但它们趋向于变慢。和上面的情况一样。

这是我为测试而创建的样本数据：

#create some test data
set.seed(1)
lon<-1:300
lat<-1:200
pft<-1:15
tot<-length(lon)*length(lat)*length(pft)
pft_wtgcell<- array(rnorm(tot, 10), dim=c(length(lon),length(lat),length(pft)))
pft_wtcol<- array(rnorm(tot, 60, 2), dim=c(length(lon),length(lat),length(pft)))
FSR<- array(rnorm(tot, 100, 3), dim=c(length(lon),length(lat),length(pft)))
temp<-list(lon=lon, lat=lat, pft=pft, pft_wtgcell=pft_wtgcell, pft_wtcol=pft_wtcol, FSR=FSR)

这是我的解决方案：

numlist<-c('01','02','03','04','05','06','07','08','09','10','11','12')
#varlist<-c(1,2,4,8,9,21)
varname<-c("lon","lat","pft","pft_wtgcell","pft_wtcol","FSR")
files<-matrix(data=NA, nrow=12, ncol=length(varname))
#loop to cycle through the file starts here:
i<-1

#crate data.frame for lon, lat and pft
newname<-paste(c("Y2005", numlist[i],".", varname[1]), collapse='')
coord<-expand.grid(temp$lon, temp$lat, temp$pft)
assign(newname, coord)
files[i,1]<-newname
#loop through the variables of interest
#  could probly be simplified.
for (j in 4:length(varname)) {
  newname<-paste(c("Y2005", numlist[i],".", varname[j]), collapse='')
  assign(newname, as.data.frame.table(temp[[varname[j] ]])$Freq)
  files[i,j]<-newname
}

我避免了将样本数据转换为数据框并决定直接在列表上工作。该expand.grid函数快速创建一个数据框，其中包含lon，lat和pft的所有可能组合。在寻找有关如何扁平化3d数组的提示时，我找到了as.data.frame.table在这种情况下可以正常工作的函数的引用，而且我只是存储了数据的最后（扁平化）列。由于只有所需的数据存储在data.frame中，因此rbind的执行速度也应更快。

我没有进行大量的错误检查，但是在笔记本电脑上，上述测试用例的速度提高了500倍。

如果这对您有用，请接受答案，否则，我可以进行更多调整。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-06-22

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

R中循环慢

R中循环慢

Perl中的慢循环

R中的While循环

为什么$ .each（）比jquery中的for循环慢？

为什么R for循环比使用foreach慢10倍？

R中的循环功能

R中的循环问题

在R中嵌套for循环

循环R中的小数

在R中重复for循环

R中的矩阵循环

矢量化代码比Matlab中的for循环慢

Mongodb慢更新循环

加快R中的循环

R中的循环-回归

R：RCURL中的循环

R中循环慢

避免R中的for循环

超慢C ++ For循环

在R中绘制for循环

在R中尝试for循环

关于R中的循环

在R中的循环内循环

为什么$ .each（）比jquery中的for循环慢？

MATLAB中带有矢量标量乘法的真正慢循环

在R中的循环内循环

使用GCD的慢循环

矢量化代码比Matlab中的for循环慢

如果R中的else循环改进得慢

R - 数据框上的非常慢的循环以替换值