Python Scikit随机森林pred_proba输出四舍五入值

debugcn 发表于 Dev

alias_neo92

我在scikit learning中使用随机森林进行分类和获取类概率，因此我使用了pred_proba函数。但是它输出的概率四舍五入到小数点后第一位

我尝试使用样本虹膜数据集

iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75
df['species'] = pd.Categorical(iris.target, iris.target_names)
df.head()

train, test = df[df['is_train']==True], df[df['is_train']==False]

features = df.columns[:4]
clf = RandomForestClassifier(n_jobs=2)
y, _ = pd.factorize(train['species'])
clf.fit(train[features], y)
clf.predict_proba(train[features])

输出概率

   [ 1. ,  0. ,  0. ],
   [ 1. ,  0. ,  0. ],
   [ 1. ,  0. ,  0. ],
   [ 1. ,  0. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  0.8,  0.2],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],
   [ 0. ,  1. ,  0. ],

它是默认输出吗？可以增加小数位吗？

注意：找到了解决方案。默认编号增加数量后，树木的数量= 10。树木成百上千的概率的准确性增加了。

调用外壳

显然有十棵树的默认设置，您在代码中使用的是默认设置：

Parameters: 
n_estimators : integer, optional (default=10)
The number of trees in the forest.

尝试这样的事情，将树的数量增加到25或大于10：

RandomForestClassifier(n_estimators=25, n_jobs=2)

如果您只是获得10个默认树的投票比例，那么很可能会导致您看到的概率

您可能会遇到问题，因为虹膜数据集非常小。如果我想起矫正的话，则少于200个观察值。

predict.proba（）的文档为：

The predicted class probabilities of an input sample is computed as the
mean predicted class probabilities of the trees in the forest. The class
probability of a single tree is the fraction of samples of the same 
class in a leaf.

我没有在文档中找到可以调整预测概率的十进制精度的任何参数。

本文收集自互联网，转载请注明来源。

如有侵权，请联系[email protected] 删除。