Scikit-learn SVC always giving accuracy 0 on random data cross validation

Shovalt

In the following code I create a random sample set of size 50, with 20 features each. I then generate a random target vector composed of half True and half False values.

All of the values are stored in Pandas objects, since this simulates a real scenario in which the data will be given in that way.

I then perform a manual leave-one-out inside a loop, each time selecting an index, dropping its respective data, fitting the rest of the data using a default SVC, and finally running a prediction on the left-out data.

import random
import numpy as np
import pandas as pd
from sklearn.svm import SVC

n_samp = 50
m_features = 20

X_val = np.random.rand(n_samp, m_features)
X = pd.DataFrame(X_val, index=range(n_samp))
# print X_val

y_val = [True] * (n_samp/2) + [False] * (n_samp/2)
random.shuffle(y_val)
y = pd.Series(y_val, index=range(n_samp))
# print y_val

seccess_count = 0
for idx in y.index:
    clf = SVC()  # Can be inside or outside loop. Result is the same.

    # Leave-one-out for the fitting phase
    loo_X = X.drop(idx)
    loo_y = y.drop(idx)
    clf.fit(loo_X.values, loo_y.values)

    # Make a prediction on the sample that was left out
    pred_X = X.loc[idx:idx]
    pred_result = clf.predict(pred_X.values)
    print y.loc[idx], pred_result[0]  # Actual value vs. predicted value - always opposite!
    is_success = y.loc[idx] == pred_result[0]
    seccess_count += 1 if is_success else 0

print '\nSeccess Count:', seccess_count  # Almost always 0!

Now here's the strange part - I expect to get an accuracy of about 50%, since this is random data, but instead I almost always get exactly 0! I say almost always, since every about 10 runs of this exact code I get a few correct hits.

What's really crazy to me is that if I choose the answers opposite to those predicted, I will get 100% accuracy. On random data!

What am I missing here?

Shovalt

Ok, I think I just figured it out! It all comes down to our old machine learning foe - the majority class.

In more detail: I chose a target comprising 25 True and 25 False values - perfectly balanced. When performing the leave-one-out, this caused a class imbalance, say 24 True and 25 False. Since the SVC was set to default parameters, and run on random data, it probably couldn't find any way to predict the result other than choosing the majority class, which in this iteration would be False! So in every iteration the imbalance was turned against the currently-left-out sample.

All in all - a good lesson in machine learning, and an excelent mathematical riddle to share with your friends :)

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集
0

コメントを追加

0

関連記事

分類Dev

Cross-validation metrics in scikit-learn for each data split

分類Dev

How to get the prediction probabilities using cross validation in scikit-learn

分類Dev

Difference between accuracy_score in scikit-learn and accuracy in Keras

分類Dev

Grid Search and Early Stopping Using Cross Validation with XGBoost in SciKit-Learn

分類Dev

Balanced Random Forest in scikit-learn (python)

分類Dev

「KeyError:0」、xgboost、scikit-learn、pandas

分類Dev

Keras model not training layers, validation accuracy always 0.5

分類Dev

mtry in Caret cross validation Random Forest method

分類Dev

scikit-learnのSVCとSVMの違いは何ですか?

分類Dev

SVC、scikit-learnで意味する詳細なログの略語

分類Dev

scikit-learn:SVCとSGDの違いは何ですか?

分類Dev

scikit-learnのclass_weightを使用したSVC

分類Dev

Data not persistent in scikit-learn transformers

分類Dev

Keras validation accuracy is 0, and stays constant throughout the training

分類Dev

scikit-learn SVM.SVC()が非常に遅いのはなぜですか?

分類Dev

mlr : Avoiding data leakage in cross validation

分類Dev

Why is validation accuracy higher than training accuracy when applying data augmentation?

分類Dev

Error plotting scikit-learn dataset training and test data

分類Dev

validation loss and accuracy rising

分類Dev

scikit-learnのSVCとLinearSVCはどのパラメーターで同等ですか?

分類Dev

ヒンジロスのあるSGDClassifierがscikit-learnでのSVC実装よりも速い理由

分類Dev

scikit-learnの更新: 'SVC'オブジェクトに属性 '_probA'がありませんか?

分類Dev

decision_function_shape='ovo' を使用した scikit-learn SVC decision_function からの確率の予測

分類Dev

My r-squared score is coming negative but my accuracy score using k-fold cross validation is coming to about 92%

分類Dev

scikit learn documentation in PDF

分類Dev

相互検証:scikit-learn引数からのcross_val_score関数

分類Dev

ValueError: 'balanced_accuracy'はscikit-learnの有効なスコア値ではありません

分類Dev

フィットモデルのスコア法とscikit-learnのaccuracy_scoreの違いは何ですか?

分類Dev

Keras: custom data validation callback on training data always returns validation data results

Related 関連記事

  1. 1

    Cross-validation metrics in scikit-learn for each data split

  2. 2

    How to get the prediction probabilities using cross validation in scikit-learn

  3. 3

    Difference between accuracy_score in scikit-learn and accuracy in Keras

  4. 4

    Grid Search and Early Stopping Using Cross Validation with XGBoost in SciKit-Learn

  5. 5

    Balanced Random Forest in scikit-learn (python)

  6. 6

    「KeyError:0」、xgboost、scikit-learn、pandas

  7. 7

    Keras model not training layers, validation accuracy always 0.5

  8. 8

    mtry in Caret cross validation Random Forest method

  9. 9

    scikit-learnのSVCとSVMの違いは何ですか?

  10. 10

    SVC、scikit-learnで意味する詳細なログの略語

  11. 11

    scikit-learn:SVCとSGDの違いは何ですか?

  12. 12

    scikit-learnのclass_weightを使用したSVC

  13. 13

    Data not persistent in scikit-learn transformers

  14. 14

    Keras validation accuracy is 0, and stays constant throughout the training

  15. 15

    scikit-learn SVM.SVC()が非常に遅いのはなぜですか?

  16. 16

    mlr : Avoiding data leakage in cross validation

  17. 17

    Why is validation accuracy higher than training accuracy when applying data augmentation?

  18. 18

    Error plotting scikit-learn dataset training and test data

  19. 19

    validation loss and accuracy rising

  20. 20

    scikit-learnのSVCとLinearSVCはどのパラメーターで同等ですか?

  21. 21

    ヒンジロスのあるSGDClassifierがscikit-learnでのSVC実装よりも速い理由

  22. 22

    scikit-learnの更新: 'SVC'オブジェクトに属性 '_probA'がありませんか?

  23. 23

    decision_function_shape='ovo' を使用した scikit-learn SVC decision_function からの確率の予測

  24. 24

    My r-squared score is coming negative but my accuracy score using k-fold cross validation is coming to about 92%

  25. 25

    scikit learn documentation in PDF

  26. 26

    相互検証:scikit-learn引数からのcross_val_score関数

  27. 27

    ValueError: 'balanced_accuracy'はscikit-learnの有効なスコア値ではありません

  28. 28

    フィットモデルのスコア法とscikit-learnのaccuracy_scoreの違いは何ですか?

  29. 29

    Keras: custom data validation callback on training data always returns validation data results

ホットタグ

アーカイブ