I was asked to write a program for Linear Regression with the following steps.
- Load the R data set mtcars as a pandas dataframe.
- Build another linear regression model by considering the log of independent variable wt, and log of dependent variable mpg.
- Fit the model with data, and display the R-squared value
I am a beginner at Statistics with Python.
I have tried getting the log values without converting to a new DataFrame but that gave an error saying “TypeError: ‘OLS’ object is not subscriptable”
JavaScript
x
14
14
1
import statsmodels.api as sa
2
import statsmodels.formula.api as sfa
3
import pandas as pd
4
import numpy as np
5
6
cars = sa.datasets.get_rdataset("mtcars")
7
cars_data = cars.data
8
lin_mod1 = sfa.ols("wt~mpg",cars_data)
9
lin_mod2 = pd.DataFrame(lin_mod1)
10
lin_mod2['wt'] = np.log(lin_mod2['wt'])
11
lin_mod2['mpg'] = np.log(lin_mod2['mpg'])
12
lin_res1 = lin_mod2.fit()
13
print(lin_res1.summary())
14
The expected result is the table after linear regression but the actual output is an error
[ValueError: DataFrame constructor not properly called!]
Advertisement
Answer
This might work for you.
JavaScript
1
8
1
import statsmodels.api as sm
2
import numpy as np
3
mtcars = sm.datasets.get_rdataset('mtcars')
4
mtcars_data = mtcars.data
5
liner_model = sm.formula.ols('np.log(wt) ~ np.log(mpg)',mtcars_data)
6
liner_result = liner_model.fit()
7
print(liner_result.rsquared)
8