使用cross_val_score和StackingClassifier或Voting分类器获取“ nan”

杜沙克

我想将StackingClassifier和VotingClassifier与StratifiedKFold和cross_val_score一起使用。如果使用StackingClassifier或VotingClassifier,我会在cross_val_score中获得nan值。如果我使用任何其他算法代替StackingClassifier或VotingClassifier,则cross_val_score可以正常工作。我正在使用python 3.8.5和sklearn 0.23.2。

将代码更新为工作示例。请使用来自kaggle Parkinsons数据集的Parkinons数据集。这是我一直在努力的数据集,以下是我遵循的确切步骤。

import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn import preprocessing
from sklearn import metrics
from sklearn import model_selection
from sklearn import feature_selection

from imblearn.over_sampling import SMOTE

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import StackingClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import RandomForestClassifier

import warnings
warnings.filterwarnings('ignore')

dataset = pd.read_csv('parkinsons.csv')


FS_X=dataset.iloc[:,:-1]
FS_y=dataset.iloc[:,-1:]

FS_X.drop(['name'],axis=1,inplace=True)

select_k_best = feature_selection.SelectKBest(score_func=feature_selection.f_classif,k=15)
X_k_best = select_k_best.fit_transform(FS_X,FS_y)

supportList = select_k_best.get_support().tolist()
p_valuesList = select_k_best.pvalues_.tolist()

toDrop=[]

for i in np.arange(len(FS_X.columns)):
    bool = supportList[i]
    if(bool == False):
        toDrop.append(FS_X.columns[i])     

FS_X.drop(toDrop,axis=1,inplace=True)        

smote = SMOTE(random_state=7)
Balanced_X,Balanced_y = smote.fit_sample(FS_X,FS_y)
before = pd.merge(FS_X,FS_y,right_index=True, left_index=True)
after = pd.merge(Balanced_X,Balanced_y,right_index=True, left_index=True)
b=before['status'].value_counts()
a=after['status'].value_counts()
print('Before')
print(b)
print('After')
print(a)

SkFold = model_selection.StratifiedKFold(n_splits=10, random_state=7, shuffle=False)

estimators_list = list()

KNN = KNeighborsClassifier()
RF = RandomForestClassifier(criterion='entropy',random_state = 1)
DT = DecisionTreeClassifier(criterion='entropy',random_state = 1)
GNB = GaussianNB()
LR = LogisticRegression(random_state = 1)

estimators_list.append(LR)
estimators_list.append(RF)
estimators_list.append(DT)
estimators_list.append(GNB)

SCLF = StackingClassifier(estimators = estimators_list,final_estimator = KNN,stack_method = 'predict_proba',cv=SkFold,n_jobs = -1)
VCLF = VotingClassifier(estimators = estimators_list,voting = 'soft',n_jobs = -1)

scores1 = model_selection.cross_val_score(estimator = SCLF,X=Balanced_X.values,y=Balanced_y.values,scoring='accuracy',cv=SkFold)
print('StackingClassifier Scores',scores1)

scores2 = model_selection.cross_val_score(estimator = VCLF,X=Balanced_X.values,y=Balanced_y.values,scoring='accuracy',cv=SkFold)
print('VotingClassifier Scores',scores2)

scores3 = model_selection.cross_val_score(estimator = DT,X=Balanced_X.values,y=Balanced_y.values,scoring='accuracy',cv=SkFold)
print('DecisionTreeClassifier Scores',scores3)

输出量

Before
1    147
0     48
Name: status, dtype: int64
After
1    147
0    147
Name: status, dtype: int64
StackingClassifier Scores [nan nan nan nan nan nan nan nan nan nan]
VotingClassifier Scores [nan nan nan nan nan nan nan nan nan nan]
DecisionTreeClassifier Scores [0.86666667 0.9        0.93333333 0.86666667 0.96551724 0.82758621
 0.75862069 0.86206897 0.86206897 0.93103448]

我检查了Stackoverflow上的其他一些相关帖子,但无法解决我的问题。我无法理解我要去哪里。

飞翔的荷兰人

estimators_list,因为它被传递到StackingClassifierVotingClassifier不正确。sklearn上有关StackingClassifier文档说:

基本估算器,这些估算器将堆叠在一起。列表中的每个元素都定义为字符串(即名称)元组和一个估计器实例。可以使用set_params将估算器设置为“丢弃”。

因此,正确的清单应如下所示:

KNN = KNeighborsClassifier()
DT = DecisionTreeClassifier(criterion="entropy")
GNB = GaussianNB()

estimators_list = [("KNN", KNN), ("DT", DT), ("GNB", GNB)]

包含parkinsons数据的完整的最小工作示例如下所示:

import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import StackingClassifier


dataset = pd.read_csv("parkinsons.csv")

FS_X = dataset.drop(["name", "status"], axis=1)
FS_y = dataset["status"]

estimators_list = [("KNN", KNeighborsClassifier()), ("DT", DecisionTreeClassifier(criterion="entropy")), ("GNB", GaussianNB())]

SCLF = StackingClassifier(estimators=estimators_list)

X_train, X_test, y_train, y_test = train_test_split(FS_X, FS_y)
SCLF.fit(X_train, y_train)
print("SCLF: ", accuracy_score(y_test, SCLF.predict(X_test)))

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

nan,NaN和NAN有什么区别

来自分类Dev

Angularjs和Number.NaN

来自分类Dev

C功能和NaN

来自分类Dev

cross_val_score和估算器得分之间的区别?

来自分类Dev

NaN有2种:signaling_NaN,silent_NaN和... NAN本身?

来自分类Dev

numpy:不使用nan_to_num乘以NaN值

来自分类Dev

scikit-learn中处理nan / null的分类器

来自分类Dev

在sklearn.cross_validation中使用train_test_split和cross_val_score之间的区别

来自分类Dev

使用AVX与NaN进行比较

来自分类Dev

使用dplyr删除NaN

来自分类Dev

从kfold,fit,score中获取分数值与使用cross_val_score之间有什么区别?

来自分类Dev

PyAutoGui Nan NaN NaN

来自分类Dev

是否可以在cross_val_score中使用与cross_val_score中相同的k折?

来自分类Dev

雷+ cross_val_score

来自分类Dev

通过++和-返回NaN

来自分类Dev

自动计算时获取“ Nan”

来自分类Dev

使用JsSimpleDateFormat转换印度格式时,如何解决“ NaN-NaN-0NaN”错误?

来自分类Dev

Angularjs和Number.NaN

来自分类Dev

获取数字的CastError NaN

来自分类Dev

使用JavaScript日期在Safari上获取NAN

来自分类Dev

使用秒表获取NaN

来自分类Dev

MATLAB如何使用NaN和NaN替换真实值

来自分类Dev

jQuery不断获取NaN

来自分类Dev

使用 Pandas fillna 替换 NaN

来自分类Dev

从 Javascript 获取 NaN 作为结果

来自分类Dev

Cookie 使用 parseInt 返回 NaN

来自分类Dev

np.nan 和 np.NaN 的区别

来自分类Dev

无法获取索引,返回 NaN

来自分类Dev

为什么分类器的 score 函数返回的结果与 sklearn 中的 cross_val_score 函数完全不同?