我想知道learning_curve()
我申请的结果:
X_train1_be.shape
> (1360, 2)
y_train1_be.shape
> (1360, 2)
train_sizes, train_scores, test_scores = learning_curve(grid_best
, X_train1_be
, y_train1_be
, n_jobs=n_jobs
, scoring = 'neg_mean_squared_error'
, cv=TimeSeriesSplit(n_splits = 5)
, verbose=2
, shuffle = False
, train_sizes = [1
, round(len(X_train1_be)/10)
, round(len(X_train1_be)/5)
, round(len(X_train1_be)/3)
, round(len(X_train1_be)/2)
, round(len(X_train1_be)/1)
]
)
但这导致
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-178-9216e6224b3b> in <module>
12 , round(len(X_train1_be)/3)
13 , round(len(X_train1_be)/2)
---> 14 , round(len(X_train1_be)/1)
15 ]
16 )
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in learning_curve(estimator, X, y, groups, train_sizes, cv, scoring, exploit_incremental_learning, n_jobs, pre_dispatch, verbose, shuffle, random_state, error_score)
1257 # use the first 'n_max_training_samples' samples.
1258 train_sizes_abs = _translate_train_sizes(train_sizes,
-> 1259 n_max_training_samples)
1260 n_unique_ticks = train_sizes_abs.shape[0]
1261 if verbose > 0:
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py in _translate_train_sizes(train_sizes, n_max_training_samples)
1341 % (n_max_training_samples,
1342 n_min_required_samples,
-> 1343 n_max_required_samples))
1344
1345 train_sizes_abs = np.unique(train_sizes_abs)
ValueError: train_sizes has been interpreted as absolute numbers of training samples and must be within (0, 230], but is within [1, 1360].
相反,以下工作原理:
grid_best = grid_result.best_estimator_
train_sizes, train_scores, test_scores = learning_curve(grid_best
, X_train1_be
, y_train1_be
, n_jobs=n_jobs
, scoring = 'neg_mean_squared_error'
, cv=TimeSeriesSplit(n_splits = 5)
, verbose=2
, shuffle = False
, train_sizes = np.linspace(0.001, 1, 10))
> [learning_curve] Training set sizes: [ 1 25 51 76 102 127 153 178 204 230]
根据此链接,首先应该尝试的工作方式:
确定训练集大小首先让我们决定要用于生成学习曲线的训练集大小。最小值为1。最大值由训练集中的实例数给出。我们的训练集有9568个实例,因此最大值为9568。但是,我们尚未将验证集放在一旁。我们将使用80:20的比例进行此操作,最后得到7654个实例的训练集(80%)和1914个实例的验证集(20%)。假设我们的训练集将有7654个实例,则可用于生成学习曲线的最大值为7654。对于我们的情况,在这里,我们使用以下六个大小:
train_sizes = [1,100,500,2000,5000,7654]
似乎这是前一段时间已经提出的问题:github.com/scikit-learn/scikit-learn/issues/7834意味着,目前尚不可能,而且似乎情况不会很快改变。
对我来说,一个规避方法是将数据集相乘,以使第一个保全包含整个数据集。
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句