I would like to perform a simple logistic regression (1 dependent, 1 independent variable) in python. All of the documentation I see about logistic regressions in python is for using it to develop a predictive model. I would like to use it more from the statistics side. How do I find the Odds ratio
, p-value
, and confidence interval
of a simple logistic regression on python?
X = df[predictor] y = df[binary_outcome] model = LogisticRegression() model.fit(X,y) print(#model_stats)
with an ideal output of Odds ratio
, p-value
, and confidence interval
Advertisement
Answer
I assume you are using LogisticRegression()
from sklearn
. You don’t get to estimate p-value confidence interval from that. You can use statsmodels, also note that statsmodels without formulas is a bit different from sklearn (see comments by @Josef), so you need to add a intercept using sm.add_constant()
:
import statsmodels.api as sm y = np.random.choice([0,1],50) x = np.random.normal(0,1,50) model = sm.GLM(y, sm.add_constant(x), family=sm.families.Binomial()) results = model.fit() results.summary() Generalized Linear Model Regression Results Dep. Variable: y No. Observations: 50 Model: GLM Df Residuals: 48 Model Family: Binomial Df Model: 1 Link Function: logit Scale: 1.0000 Method: IRLS Log-Likelihood: -33.125 Date: Sat, 09 Jan 2021 Deviance: 66.250 Time: 16:21:51 Pearson chi2: 50.1 No. Iterations: 4 Covariance Type: nonrobust coef std err z P>|z| [0.025 0.975] const -0.0908 0.309 -0.294 0.769 -0.696 0.514 x1 0.5975 0.361 1.653 0.098 -0.111 1.306
The coefficient is in log odds, you can simply convert that to odds ratio. The [0.025 0.975] columns are the 95% confidence interval for the log odds. Check out help page for more info