Skip to content
Advertisement

Python and SPSS giving different output for Logistic Regression

Code:

from sklearn.linear_model import LogisticRegression
l = LogisticRegression()
b = l.fit(XT,Y)
    print "coeff ",b.coef_
    print "intercept ",b.intercept_

Here’s the dataset

XT =
[[23]
 [24]
 [26]
 [21]
 [29]
 [31]
 [27]
 [24]
 [22]
 [23]]
Y = [1 0 1 0 0 1 1 0 1 0]

Result:

coeff  [[ 0.00850441]]
intercept  [-0.15184511

Now I added the same data in spss.Analyse->Regression->Binary Logistic Regression. I set the corresponding Y -> dependent and XT -> Covariates. The results weren’t even close. Am I missing something in python or SPSS? Result of binary logistic regression on SPSSPython-Sklearn

Advertisement

Answer

SPSS Logistic regression does not include parameter regularisation in it’s cost function, it just does ‘raw’ logistic regression. In regularisation, the cost function includes a regularisation expression to prevent overfitting. You specify the inverse of this with the C value. If you set C to a very high value, it will closely mimic SPSS, so there is no magic number – just set it as high as you can, and there will be no regularisation.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement