Skip to content
Advertisement

beta coefficients and p-value with l Logistic Regression in Python

I would like to perform a simple logistic regression (1 dependent, 1 independent variable) in python. All of the documentation I see about logistic regressions in python is for using it to develop a predictive model. I would like to use it more from the statistics side. How do I find the Odds ratio, p-value, and confidence interval of a simple logistic regression on python?

X = df[predictor]
y = df[binary_outcome]

model = LogisticRegression()
model.fit(X,y)

print(#model_stats)

with an ideal output of Odds ratio, p-value, and confidence interval

Advertisement

Answer

I assume you are using LogisticRegression() from sklearn. You don’t get to estimate p-value confidence interval from that. You can use statsmodels, also note that statsmodels without formulas is a bit different from sklearn (see comments by @Josef), so you need to add a intercept using sm.add_constant() :

import statsmodels.api as sm

y = np.random.choice([0,1],50)
x = np.random.normal(0,1,50)

model = sm.GLM(y, sm.add_constant(x), family=sm.families.Binomial())
results = model.fit()
results.summary()

Generalized Linear Model Regression Results
Dep. Variable:  y   No. Observations:   50
Model:  GLM Df Residuals:   48
Model Family:   Binomial    Df Model:   1
Link Function:  logit   Scale:  1.0000
Method: IRLS    Log-Likelihood: -33.125
Date:   Sat, 09 Jan 2021    Deviance:   66.250
Time:   16:21:51    Pearson chi2:   50.1
No. Iterations: 4       
Covariance Type:    nonrobust       
coef    std err z   P>|z|   [0.025  0.975]
const   -0.0908 0.309   -0.294  0.769   -0.696  0.514
x1  0.5975  0.361   1.653   0.098   -0.111  1.306

The coefficient is in log odds, you can simply convert that to odds ratio. The [0.025 0.975] columns are the 95% confidence interval for the log odds. Check out help page for more info

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement