I was asked to write a program for Linear Regression with the following steps.
- Load the R data set mtcars as a pandas dataframe.
- Build another linear regression model by considering the log of independent variable wt, and log of dependent variable mpg.
- Fit the model with data, and display the R-squared value
I am a beginner at Statistics with Python.
I have tried getting the log values without converting to a new DataFrame but that gave an error saying “TypeError: ‘OLS’ object is not subscriptable”
import statsmodels.api as sa import statsmodels.formula.api as sfa import pandas as pd import numpy as np cars = sa.datasets.get_rdataset("mtcars") cars_data = cars.data lin_mod1 = sfa.ols("wt~mpg",cars_data) lin_mod2 = pd.DataFrame(lin_mod1) lin_mod2['wt'] = np.log(lin_mod2['wt']) lin_mod2['mpg'] = np.log(lin_mod2['mpg']) lin_res1 = lin_mod2.fit() print(lin_res1.summary())
The expected result is the table after linear regression but the actual output is an error
[ValueError: DataFrame constructor not properly called!]
Advertisement
Answer
This might work for you.
import statsmodels.api as sm import numpy as np mtcars = sm.datasets.get_rdataset('mtcars') mtcars_data = mtcars.data liner_model = sm.formula.ols('np.log(wt) ~ np.log(mpg)',mtcars_data) liner_result = liner_model.fit() print(liner_result.rsquared)