Confusing probabilities of the predict_proba of scikit-learn's svm

user3030046

My purpose is to draw the PR curve by the sorted probability of each sample for a specific class. However, I found that the obtained probabilities by svm's predict_proba() have two different behaviors when I use two different standard datasets: the iris and digits.

The first case is evaluated with the "iris" case with the python code below, and it works reasonably that the class gets the highest probability.

D = datasets.load_iris()
clf = SVC(kernel=chi2_kernel, probability=True).fit(D.data, D.target)
output_predict = clf.predict(D.data)
output_proba = clf.predict_proba(D.data)
output_decision_function = clf.decision_function(D.data)
output_my = proba_to_class(output_proba, clf.classes_)

print D.data.shape, D.target.shape
print "target:", D.target[:2]
print "class:", clf.classes_
print "output_predict:", output_predict[:2]
print "output_proba:", output_proba[:2]

Next, it produces the outputs like below. Apparently, the highest probability of each sample match the outputs of the predict(): The 0.97181088 for sample #1 and 0.96961523 for sample #2.

(150, 4) (150,)
target: [0 0]
class: [0 1 2]
output_predict: [0 0]
output_proba: [[ 0.97181088  0.01558693  0.01260218]
[ 0.96961523  0.01702481  0.01335995]]

However, when I change the dataset to "digits" with the following code, the probabilities reveal an inverse phenomenon, that the lowest probability of each sample dominates the outputted labels of the predict() with probability 0.00190932 for sample #1 and 0.00220549 for sample #2.

D = datasets.load_digits()

Outputs:

(1797, 64) (1797,)
target: [0 1]
class: [0 1 2 3 4 5 6 7 8 9]
output_predict: [0 1]
output_proba: [[ 0.00190932  0.11212957  0.1092459   0.11262532      0.11150733  0.11208733
0.11156622  0.11043403  0.10747514  0.11101985]
[ 0.10991574  0.00220549  0.10944998  0.11288081  0.11178518   0.11234661
0.11182221  0.11065663  0.10770783  0.11122952]]

I've read this post and it leads a solution to using linear SVM with decision_function(). However, because of my task, I still have to focus on the chi-squared kernel for SVM.

Any solutions?

Andreas Mueller

As the documentation states, there is no guarantee that predict_proba and predict will give consistent results on SVC. You can simply use decision_function. That is true for both linear and kernel SVM.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

What's the difference between predict_proba and decision_function in scikit-learn?

From Dev

How to get classes labels from cross_val_predict used with predict_proba in scikit-learn

From Dev

Scikit-learn 0.15.2 - OneVsRestClassifier not works due to predict_proba not available

From Dev

How to list all scikit-learn classifiers that support predict_proba()

From Dev

Having trouble understanding sklearn's SVM's predict_proba function

From Dev

Converting LinearSVC's decision function to probabilities (Scikit learn python )

From Dev

scikit 0.15 classifiers without predict_proba

From Dev

Python Scikit-learn Perceptron Output Probabilities

From Dev

scikit-learn return value of LogisticRegression.predict_proba

From Dev

scikit-learn: SVM giving me zero error, but can't predict

From Dev

Scikit-learn SVM digit recognition

From Dev

Convert scikit-learn SVM model to LibSVM

From Dev

load data from csv into Scikit learn SVM

From Dev

User defined SVM kernel with scikit-learn

From Dev

Convert scikit-learn SVM model to LibSVM

From Dev

load data from csv into Scikit learn SVM

From Dev

User defined SVM kernel with scikit-learn

From Dev

Scikit-learn Ridge classifier: extracting class probabilities

From Dev

Scikit-Learn: How to retrieve prediction probabilities for a KFold CV?

From Dev

Scikit-learn Ridge classifier: extracting class probabilities

From Dev

Scikit-Learn: How to retrieve prediction probabilities for a KFold CV?

From Dev

scikit-learn - multinomial logistic regression with probabilities as a target variable

From Dev

How to get independent probabilities of all classes for each sample with predict_proba?

From Dev

how does sklearn's Adaboost predict_proba works internally?

From Dev

how does sklearn's Adaboost predict_proba works internally?

From Dev

sklearn's predict_proba returns infinite probabilties

From Dev

why the predict_proba function of sklearn.svm.svc is giving probability greater than 1?

From Dev

Replication of scikit.svm.SRV.predict(X)

From Dev

What is the difference between SVC and SVM in scikit-learn?

Related Related

  1. 1

    What's the difference between predict_proba and decision_function in scikit-learn?

  2. 2

    How to get classes labels from cross_val_predict used with predict_proba in scikit-learn

  3. 3

    Scikit-learn 0.15.2 - OneVsRestClassifier not works due to predict_proba not available

  4. 4

    How to list all scikit-learn classifiers that support predict_proba()

  5. 5

    Having trouble understanding sklearn's SVM's predict_proba function

  6. 6

    Converting LinearSVC's decision function to probabilities (Scikit learn python )

  7. 7

    scikit 0.15 classifiers without predict_proba

  8. 8

    Python Scikit-learn Perceptron Output Probabilities

  9. 9

    scikit-learn return value of LogisticRegression.predict_proba

  10. 10

    scikit-learn: SVM giving me zero error, but can't predict

  11. 11

    Scikit-learn SVM digit recognition

  12. 12

    Convert scikit-learn SVM model to LibSVM

  13. 13

    load data from csv into Scikit learn SVM

  14. 14

    User defined SVM kernel with scikit-learn

  15. 15

    Convert scikit-learn SVM model to LibSVM

  16. 16

    load data from csv into Scikit learn SVM

  17. 17

    User defined SVM kernel with scikit-learn

  18. 18

    Scikit-learn Ridge classifier: extracting class probabilities

  19. 19

    Scikit-Learn: How to retrieve prediction probabilities for a KFold CV?

  20. 20

    Scikit-learn Ridge classifier: extracting class probabilities

  21. 21

    Scikit-Learn: How to retrieve prediction probabilities for a KFold CV?

  22. 22

    scikit-learn - multinomial logistic regression with probabilities as a target variable

  23. 23

    How to get independent probabilities of all classes for each sample with predict_proba?

  24. 24

    how does sklearn's Adaboost predict_proba works internally?

  25. 25

    how does sklearn's Adaboost predict_proba works internally?

  26. 26

    sklearn's predict_proba returns infinite probabilties

  27. 27

    why the predict_proba function of sklearn.svm.svc is giving probability greater than 1?

  28. 28

    Replication of scikit.svm.SRV.predict(X)

  29. 29

    What is the difference between SVC and SVM in scikit-learn?

HotTag

Archive