使用Python中的逻辑回归进行自举-构造测试向量

debugcn 发表于 Dev

hkj447

我正在尝试使用自举来获取模型的MSE的平均估算值。我正在尝试按照以下指南进行操作：https : //machinelearningmastery.com/a-gentle-introduction-to-the-bootstrap-method/这是我的代码：

models_mse = [None]*100
for i in range(100):
  boot = resample(X, replace = True, n_samples = math.floor(len(X) / 2),random_state = 1)
  y_new = [y for y in X if y not in boot]
  X_train, X_test, y_train, y_test = train_test_split(boot, y_new, test_size=0.1, random_state=42)
  reg = LogisticRegression()
  reg.fit(X_train, y_train)
  MSE = mean_squared_error(y_test, reg.predict(X_test))
  models_mse.append(MSE)

这会产生错误：

ValueError: Found input variables with inconsistent numbers of samples: [497, 0]

含义，y_new是空的。根据我对自举的理解，我们将样本视为“种群”，并用该伪种群的替换样本进行重新采样，以生成更多样本以估计伪种群的参数，即我的情况。在这里，我有一些X使用onehot的编码数据。看起来像这样：

array([[1., 0., 0., ..., 0., 0., 1.],
       [1., 0., 1., ..., 1., 0., 0.],
       [1., 0., 1., ..., 0., 0., 1.],
       ...,
       [0., 1., 0., ..., 0., 0., 1.],
       [0., 1., 0., ..., 1., 0., 0.],
       [0., 1., 0., ..., 1., 0., 0.]])

并且是一个numpy数组：

>>>type(X)
numpy.ndarray

我的问题如下：y_new当我观察到boot和之间的差异时，为什么为空X？此外，我任意决定将原始样本的一半作为样本。自举时，是否有更系统的方法来选择样本数量？最后，总体而言，我的设置是否正确解决了我要解决的问题？

谢谢。

编辑：从用户评论进行一些修订后，我的新代码是：

models_mse = [None]*100
for i in range(100):
  boot = resample(X_train, replace = True, n_samples = X_train.shape[0],random_state = 1)
  reg = LogisticRegression()
  reg.fit(boot, y_train)
  MSE = mean_squared_error(y_test, reg.predict(X_test))
  models_mse.append(MSE)

不会抛出任何错误。但是，每个模型的MSE都是完全相同的，这很奇怪，boot因为每次迭代都应该不同吗？

皮尤什·辛格（Piyush Singh）

在您的代码y_new中选择X：

y_new = [y for y in X if y not in boot]

您可能想选择X。但是它仍然无法正常工作，因为您无法in对numpy数组进行操作。就像这篇文章所说，resampleAPI并不能为您提供测试集的现成观察结果。但是，好处是我们从API中获得的想要实现非常简单。同样，您可能不想在采样时每次都使用相同的种子（随机状态）。

models_mse=[]
for _ in range(100):
    train_idx = np.random.randint(0,len(X),size=(len(X),))
    test_idx = np.array([i for i in range(len(X)) if i not in train_idx])
    X_train, Y_train, X_test, Y_test = X[train_idx], Y[train_idx], X[test_idx],Y[test_idx]
    model = LogisticRegression()
    model.fit(X_train, Y_train)
    Y_pred = model.predict(X_test)
    mse = MSE(Y_test, Y_pred) # replace by appropriate API/function
    models_mse.append(mse)

print("Bootstrapped MSE={}".format(sum(models_mse)/100))

我使用的火车集大小与原始数据集的大小相同X，这是我通常所做的。您可以根据需要进行更改。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。

编辑于2021-04-5

我来说两句

0条评论

登录后参与评论

来自分类Dev

Related 相关文章

文章

使用Python中的逻辑回归进行自举-构造测试向量

使用Python中的逻辑回归进行自举-构造测试向量

从Clojure中的向量构造地图

使用python进行逻辑回归

在Clojure中测试向量和嵌套向量

使用Python Statsmodels进行向量自回归

python中的训练/验证/测试集以进行回归

python中的训练/验证/测试集以进行回归

使用向量进行逻辑索引

如何通过boost :: interprocess在向量中构造向量

如何使用由 for 循环创建的向量向量构造链表？

熊猫中DatetimeIndex的向量化构造

包装在共享指针中的向量构造

在构造函数中创建的外部向量

向量构造函数中的函数匹配

在构造函数中创建的外部向量

如何从R中的表构造矩阵/向量？

在构造函数中更新指针向量的问题

使用放置构造恒定大小的矩阵向量

如何正确使用向量范围构造函数？

使用默认构造函数嵌入向量

在构造时使用整数序列填充向量

逻辑回归成本的向量化

构造函数Python中的逻辑

矩阵和向量中的测试条件

测试字符串向量中的 int

Google使用整体向量测试ValuesIn

基于使用 for 循环的测试创建值向量

Android在构造函数NoClassDefFoundError中测试异常

Junit测试构造函数中的异常？

尝试在C ++中构造简单的测试类