Skip to content
Advertisement

How to predict new values using statsmodels.formula.api (python)

I trained the logistic model using the following, from breast cancer data and ONLY using one feature ‘mean_area’

from statsmodels.formula.api import logit
logistic_model = logit('target ~ mean_area',breast)
result = logistic_model.fit()

There is a built in predict method in the trained model. However that gives the predicted values of all the training samples. As follows

predictions = result.predict()

Suppose I want the prediction for a new value say 30 How do I used the trained model to out put the value? (rather than reading the coefficients and computing manually)

Advertisement

Answer

You can provide new values to the .predict() model as illustrated in output #11 in this notebook from the docs for a single observation. You can provide multiple observations as 2d array, for instance a DataFramesee docs.

Since you are using the formula API, your input needs to be in the form of a pd.DataFrame so that the column references are available. In your case, you could use something like .predict(pd.DataFrame({'mean_area': [1,2,3]}).

statsmodels .predict() uses the observations used for fitting only as default when no alternative is provided.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement