scikit学习决策树模型评估

Lin Ma 发表于 Dev

Lin Ma

这是相关的代码和文档，想知道cross_val_score没有显式指定的默认值score，输出数组表示精度，AUC或其他指标吗？

将Python 2.7与miniconda解释器一起使用。

http://scikit-learn.org/stable/modules/generation/sklearn.tree.DecisionTreeClassifier.html

>>> from sklearn.datasets import load_iris
>>> from sklearn.cross_validation import cross_val_score
>>> from sklearn.tree import DecisionTreeClassifier
>>> clf = DecisionTreeClassifier(random_state=0)
>>> iris = load_iris()
>>> cross_val_score(clf, iris.data, iris.target, cv=10)
...                             
...
array([ 1.     ,  0.93...,  0.86...,  0.93...,  0.93...,
        0.93...,  0.93...,  1.     ,  0.93...,  1.      ])

问候，林

胡安帕·阿里维利亚加

从用户指南中：

默认情况下，在每次CV迭代中计算的分数是估算器的分数方法。可以通过使用scoring参数来更改此设置：

从DecisionTreeClassifier文档中：

返回给定测试数据和标签上的平均准确度。在多标签分类中，这是子集准确性，这是一个苛刻的指标，因为您需要为每个样本正确预测每个标签集。

不要被“平均准确性”所迷惑，这只是人们计算准确性的常规方式。点击链接到源：

    from .metrics import accuracy_score
    return accuracy_score(y, self.predict(X), sample_weight=sample_weight)

现在源的metrics.accuracy_score

def accuracy_score(y_true, y_pred, normalize=True, sample_weight=None):
    ...
    # Compute accuracy for each possible representation
    y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    if y_type.startswith('multilabel'):
        differing_labels = count_nonzero(y_true - y_pred, axis=1)
        score = differing_labels == 0
    else:
        score = y_true == y_pred

    return _weighted_sum(score, sample_weight, normalize)

如果您仍然不相信：

def _weighted_sum(sample_score, sample_weight, normalize=False):
    if normalize:
        return np.average(sample_score, weights=sample_weight)
    elif sample_weight is not None:
        return np.dot(sample_score, sample_weight)
    else:
        return sample_score.sum()

注意：对于accuracy_scorenormalize参数，默认值为True，因此它仅返回np.average布尔numpy数组，因此仅是正确预测的平均数。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。