For some reason GridSearchCV with an svc is producing slightly probability results given the same inputs. On the sample I've posted below, the difference is small, but I've had it be much larger on other problems. Shouldn't the GridSearch produce the same results given the same inputs each time?
See also a previous question by another user, predict_proba or decision_function as estimator "confidence" which addressed the same issue with Logistic Regression, but that turned out to be a bug -- perhaps this is too? Or does the GridSearchCV use a random seed? The difference isn't too bad in this demo problem, but I have other more complicated problems where the probability difference is enough to predict the other side of a binary state.
from sklearn import svm, grid_search, datasets
iris = datasets.load_iris()
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
svr = svm.SVC(probability=True)
clf = grid_search.GridSearchCV(svr, parameters)
clf.fit(iris.data, iris.target)
clf.predict_proba(iris.data)
array([[ 9.75883684e-01, 1.55259588e-02, 8.59035759e-03],
[ 9.61565216e-01, 2.74888948e-02, 1.09458891e-02],
[ 9.74605121e-01, 1.68928925e-02, 8.50198656e-03],
[ 9.58212635e-01, 2.97479036e-02, 1.20394616e-02],
....
and when I run the exact same code again, I get:
array([[ 9.76047242e-01, 1.54138902e-02, 8.53886802e-03],
[ 9.61893348e-01, 2.72510317e-02, 1.08556202e-02],
[ 9.74780630e-01, 1.67675046e-02, 8.45186573e-03],
[ 9.58586150e-01, 2.94842759e-02, 1.19295743e-02],'
and I can run again and again and get more different results.
Is this normal for GridSearchCV on a svc, or am I doing something wrong, or is this a bug?
I am using scikit-learn .14.1.
Thank you.
SVMs do not support probabilities. A trick to resolve this is to essentially perform logistic regression on the margin distance of each data point to the SVM decision boundary. If we did this directly, there may be some issues from the fact that all support vectors will have a distance of +- 1. To avoid this bias, 3 folds are created and a CV like procedure done to get margin values for 1/3 of the data at a time (trained on the other 2/3 of the data). Then the logistic regression is done on those values, and the SVM retrained on the whole data set. This is called Platt Scaling. The CV part is where the randomness comes in.
I've got a post that has a few 2D examples of it and some more explanation.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments