Skip to content
Advertisement

Variability/randomness of Support Vector Machine model scores in Python’s scikitlearn

I am testing several ML classification models, in this case Support Vector Machines. I have basic knowledge about the SVM algorithm and how it works.

I am using the built-in breast cancer dataset from scikit learn.

JavaScript

Using the code below:

JavaScript

When printing the scores as in:

JavaScript

When I run this code, I get certain scores per different regularization parameter C. When I would run the .fit lines again (aka train them again), these scores turn out completely different. Sometimes they are even way different (e.g. 72% vs. 90% for the same value of C).

Where does this variability come from? I thought that, assuming I use the same random_state parameter, it would always find the same support vectors and hence would give me the same results, but since the score changes when I train the model another time, this is not the case. In logistic regression, for instance, the scores are always consistent, no matter if I run the fit. code again.

Explaining this variability in accuracy score would be of much help!

Advertisement

Answer

Of course. You need to fix the random_state=None to a specific seed so that you can reproduce the results.

Otherwise, you use the default random_state=None and thus, every time you call the commands, a random seed is used and this is why you get this variability.


Use:

JavaScript
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement