如何对不同程度的多项式使用GridSearchCV？

debugcn 发表于 Dev

用户名

我想做的是遍历一些适合不同阶次多项式的OLS，以查看哪个阶次在预测mpg给定值horsepower时表现更好（同时使用LOOCV和KFold）。我编写了代码，但无法弄清楚如何使用将该PolynomialFeatures函数应用于每次迭代GridSearchCv，因此最终写成这样：

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import LeaveOneOut, KFold
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error



df = pd.read_csv('http://web.stanford.edu/~oleg2/hse/auto/Auto.csv')[['horsepower','mpg']].dropna()

pows = range(1,11)
first, second, mse = [], [], 0     # 'first' is data for the first plot and 'second' is for the second one

for p in pows:
  mse = 0
  for train_index, test_index in LeaveOneOut().split(df):
      x_train, x_test = df.horsepower.iloc[train_index], df.horsepower.iloc[test_index]
      y_train, y_test = df.mpg.iloc[train_index], df.mpg.iloc[test_index]
      polynomial_features = PolynomialFeatures(degree = p)
      x = polynomial_features.fit_transform(x_train.values.reshape(-1,1))   #getting the polynomial
      ft = LinearRegression().fit(x,y_train)
      x1 = polynomial_features.fit_transform(x_test.values.reshape(-1,1))   #getting the polynomial
      mse += mean_squared_error(y_test, ft.predict(x1))
  first.append(mse/len(df))
    
for p in pows: 
    temp = []   
    for i in range(9):      # this is to plot a few graphs for comparison
        mse = 0
        for train_index, test_index in KFold(10, True).split(df):
            x_train, x_test = df.horsepower.iloc[train_index], df.horsepower.iloc[test_index]
            y_train, y_test = df.mpg.iloc[train_index], df.mpg.iloc[test_index]
            polynomial_features = PolynomialFeatures(degree = p)
            x = polynomial_features.fit_transform(x_train.values.reshape(-1,1))   #getting the polynomial
            ft = LinearRegression().fit(x,y_train)
            x1 = polynomial_features.fit_transform(x_test.values.reshape(-1,1))   #getting the polynomial
            mse += mean_squared_error(y_test, ft.predict(x1))
        temp.append(mse/10)
    second.append(temp)      


f, pt = plt.subplots(1,2,figsize=(12,5.1))
f.tight_layout(pad=5.0)
pt[0].set_ylim([14,30])
pt[1].set_ylim([14,30])
pt[0].plot(pows, first, color='darkblue', linewidth=1)
pt[0].scatter(pows, first, color='darkblue')
pt[1].plot(pows, second)
pt[0].set_title("LOOCV", fontsize=15)
pt[1].set_title("10-fold CV", fontsize=15)
pt[0].set_xlabel('Degree of Polynomial', fontsize=15)
pt[1].set_xlabel('Degree of Polynomial', fontsize=15)
pt[0].set_ylabel('Mean Squared Error', fontsize=15)
pt[1].set_ylabel('Mean Squared Error', fontsize=15)
plt.show()

它产生：

这可以正常工作，您可以在计算机上运行它以进行测试。这确实符合我的要求，但似乎确实过多。我GridSearchCv实际上是在寻求有关如何使用或其他方法来改进它的建议。我尝试将PolynomialFeatures用作传递给LinearRegression()，但无法x即时更改。一个工作示例将不胜感激。

本·赖尼格

这种事情似乎是解决问题的方法：

pipe = Pipeline(steps=[
    ('poly', PolynomialFeatures(include_bias=False)),
    ('model', LinearRegression()),
])

search = GridSearchCV(
    estimator=pipe,
    param_grid={'poly__degree': list(pows)},
    scoring='neg_mean_squared_error',
    cv=LeaveOneOut(),
)

search.fit(df[['horsepower']], df.mpg)

first = -search.cv_results_['mean_test_score']

（在最后一行为负，因为计分器为负mse）

然后，绘图可以大致相同的方式进行。（我们这里依靠的是按cv_results_与条目相同的顺序放置条目pows；您可能希望使用的相应列来进行绘制pd.DataFrame(search.cv_results_)。）

您可以RepeatedKFold用来模拟循环KFold，尽管那样您只会得到一个图。如果您确实需要单独的图，则仍然需要外部循环，但是使用的网格搜索cv=KFold(...)可以替换内部循环。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-04-5

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

如何对不同程度的多项式使用GridSearchCV？

如何对不同程度的多项式使用GridSearchCV？

如何使用for循环对具有不同程度的不同多项式回归模型的结果执行ANOVA

如何使用fzero（）在MATLAB中求解多项式方程？

C ++如何使用链表添加多项式

使用 polyfit() 函数时如何舍入多项式系数？

如何正确找到多项式根？

如何通过点放置多项式

使用wxMaxima分解多项式

使用Numpy的多项式系数的误差

使用多项式的匿名函数

使用Numpy的多项式系数的误差

使用MATLAB生成随机多项式

使用python进行多项式加法

CRC-使用不同多项式表示法的16位查找表

多项式类

多项式比近似

零零多项式

多项式模拟

多项式操作

如何将多项式与其他多项式一起分解？

如何从评估Matlab中多项式的函数句柄中计算多项式的系数？

如何从评估Matlab中多项式的函数句柄中计算多项式的系数？

如何使用Rmath.h在C中使用多项式函数

如何确定黑盒是多项式还是指数式

如何使用gsl计算多项式回归数据点？

在GBM多项式dist中，如何使用预测来获得分类输出？

如何使用scikit-learn从多项式回归中输出回归分析摘要？

如何使用poly R函数在for循环中更改多项式阶数？

如何在mutate中以编程方式使用多项式函数？

如何使用numpy和matplotlib绘制此多项式分数？