我正在完成caret
作者的R教科书“应用预测建模”中的练习。我无法使该train
函数与方法M5P
或一起使用M5Rules
。
该代码将可以手动正常运行:
data("permeability")
trainIndex <- createDataPartition(permeability[, 1], p = 0.75,
list = FALSE)
fingerNZV <- nearZeroVar(fingerprints, saveMetrics = TRUE)
trainY <- permeability[trainIndex, 1]
testY <- permeability[-trainIndex, 1]
trainX <- fingerprints[trainIndex, !fingerNZV$nzv]
testX <- fingerprints[-trainIndex, !fingerNZV$nzv]
indx <- createFolds(trainY, k = 10, returnTrain = TRUE)
ctrl <- trainControl('cv', index = indx)
m5Tuner <- t(as.matrix(expand.grid(
N = c(1, 0),
U = c(1, 0),
M = floor(seq(4, 15, length.out = 3))
)))
startTime <- Sys.time()
m5Tune <- foreach(tuner = m5Tuner) %do% {
m5ctrl <- Weka_control(M = tuner[3],
N = tuner[1] == 1,
U = tuner[2] == 1)
mods <- lapply(ctrl$index,function(fold) {
d <- cbind(data.frame(permeability = trainY[fold]),
trainX[fold, ])
mod <- M5P(permeability ~ ., d, control = m5ctrl)
rmse <- RMSE(predict(mod, as.data.frame(trainX[-fold, ])),
trainY[-fold])
list(model = mod, rmse = rmse)
})
mean_rmse <- mean(sapply(mods, '[[', 'rmse'))
list(models = mods, mean_rmse = mean_rmse)
}
endTime <- Sys.time()
endTime - startTime
# Time difference of 59.17742 secs
相同的数据和控件(将“规则”替换为“ M”-为什么不能将M指定为调整参数?)将无法完成:
m5Tuner <- expand.grid(
pruned = c("Yes", "No"),
smoothed = c("Yes", "No"),
rules = c("Yes", "No")
)
m5Tune <- train(trainX, trainY,
method = 'M5',
trControl = ctrl,
tuneGrid = m5Tuner,
control = Weka_control(M = 10))
本书中的示例将不会完成,或者:
library(caret)
data(solubility)
set.seed(100)
indx <- createFolds(solTrainY, returnTrain = TRUE)
ctrl <- trainControl(method = "cv", index = indx)
set.seed(100)
m5Tune <- train(x = solTrainXtrans, y = solTrainY,
method = "M5",
trControl = ctrl,
control = Weka_control(M = 10))
至少对我来说,这可能是与RWeka一起使用并行后端的问题。我上面的示例不会以结尾%dopar%
。
我已经sudo R CMD javareconf
在每个示例之前运行,然后重新启动了Rstudio。
sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Arch Linux
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8
[9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] APMBook_0.0.0.9000 RWeka_0.4-27
[3] caret_6.0-68 ggplot2_2.1.0
[5] lattice_0.20-30 AppliedPredictiveModeling_1.1-6
# dozens others loaded via namespace.
当对train
和RWeka
模型使用并行处理时,您应该得到以下错误:
In train.default(trainX, trainY, method = "M5", trControl = ctrl, :
Models using Weka will not work with parallel processing with multicore/doMC
Weka的Java接口不适用于多个工作程序。
需要一些时间,但train
如果您没有向其注册工作人员,则呼叫将结束foreach
最大限度
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句