当我执行命令时:
clf.fit(train_data, train_label)
我收到以下错误
ValueError: 输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值。
问题是train_data
大小为 (18000,20)的数组。我试过使用这个命令:
clf.fit(np.float32(train_data), train_label)
或者
train_data = np.array([s[0].astype('float32') for s in train_data])
在以下链接的训练文件 (python) 中找到数据集 train_data 和 train_label:
https://www.dropbox.com/s/b3017gi18x6x325/train?dl=0
但是,我无法从数组“train_data”中获取对clf.fit
函数有效的所有值。有什么帮助吗?
刚刚找到了克服此错误的解决方案。您需要缩放数据:
代码:
from sklearn.ensemble import RandomForestClassifier
import pickle
import numpy as np
from sklearn.preprocessing import scale
with open('train', 'rb') as f:
train_data, train_label = pickle.load(f)
#some diagnostic to see if there are NaNs. No NaN were found !
print(np.isnan(train_data))
print(np.where(np.isnan(train_data)))
print(np.nan_to_num(train_data))
print(np.isnan(train_label))
print(np.where(np.isnan(train_label)))
#so need to scale
train_data = scale(train_data)
clf = RandomForestClassifier()
clf.fit(train_data, train_label)
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句