대출 예측 연습 문제를 연습하고 데이터의 누락 된 값을 채우려 고합니다. 여기 에서 데이터를 얻었습니다 . 이 문제를 해결하기 위해이 자습서를 따릅니다 .
내가 사용중인 전체 코드 (파일 이름 model.py)와 GitHub 의 데이터를 찾을 수 있습니다 .
DataFrame은 다음과 같습니다.
마지막 행이 실행 된 후 (model.py 파일의 122 행에 해당)
/home/user/.local/lib/python2.7/site-packages/numpy/lib/arraysetops.py:216: FutureWarning: numpy not_equal will not check object identity in the future. The comparison did not return the same result as suggested by the identity (`is`)) and will change.
flag = np.concatenate(([True], aux[1:] != aux[:-1]))
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
Traceback (most recent call last):
File "model.py", line 123, in <module>
classification_model(model, df,predictor_var,outcome_var)
File "model.py", line 89, in classification_model
model.fit(data[predictors],data[outcome])
File "/usr/local/lib/python2.7/dist-packages/sklearn/linear_model/logistic.py", line 1173, in fit
order="C")
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 521, in check_X_y
ensure_min_features, warn_on_dtype, estimator)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 407, in check_array
_assert_all_finite(array)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 58, in _assert_all_finite
" or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
누락 된 값 때문에이 오류가 발생합니다. 이 누락 된 값을 어떻게 채우나요?
The missing values for Self_Employed and LoanAmount is filled how do I fill the rest.Thank you for the help.
You can use fillna
:
df['Gender'].fillna('no data',inplace=True)
df['Married'].fillna('no data',inplace=True)
Or if need replace multiple columns to same value:
cols = ['Gender','Married']
df[cols] = df[cols].fillna('no data')
If need replace multiple columns is possible use dict
with column names and value for replace:
df = pd.DataFrame({'Gender':['m','f',np.nan],
'Married':[np.nan,'yes','no'],
'credit history':[1.,np.nan,0]})
print (df)
Gender Married credit history
0 m NaN 1.0
1 f yes NaN
2 NaN no 0.0
d = {'Gender':'no data', 'Married':'no data', 'credit history':0}
df = df.fillna(d)
print (df)
Gender Married credit history
0 m no data 1.0
1 f yes 0.0
2 no data no 0.0
이 기사는 인터넷에서 수집됩니다. 재 인쇄 할 때 출처를 알려주십시오.
침해가 발생한 경우 연락 주시기 바랍니다[email protected] 삭제
몇 마디 만하겠습니다