Skip to content
Advertisement

Unable to fix “ValueError: DataFrame constructor not properly called!”

I was asked to write a program for Linear Regression with the following steps.

  1. Load the R data set mtcars as a pandas dataframe.
  2. Build another linear regression model by considering the log of independent variable wt, and log of dependent variable mpg.
  3. Fit the model with data, and display the R-squared value

I am a beginner at Statistics with Python.

I have tried getting the log values without converting to a new DataFrame but that gave an error saying “TypeError: ‘OLS’ object is not subscriptable”

import statsmodels.api as sa
import statsmodels.formula.api as sfa
import pandas as pd
import numpy as np

cars = sa.datasets.get_rdataset("mtcars")
cars_data = cars.data
lin_mod1 = sfa.ols("wt~mpg",cars_data)
lin_mod2 = pd.DataFrame(lin_mod1)
lin_mod2['wt'] = np.log(lin_mod2['wt'])
lin_mod2['mpg'] = np.log(lin_mod2['mpg'])
lin_res1 = lin_mod2.fit()
print(lin_res1.summary())

The expected result is the table after linear regression but the actual output is an error

[ValueError: DataFrame constructor not properly called!]

Advertisement

Answer

This might work for you.

import statsmodels.api as sm
import numpy as np
mtcars = sm.datasets.get_rdataset('mtcars')
mtcars_data = mtcars.data
liner_model = sm.formula.ols('np.log(wt) ~ np.log(mpg)',mtcars_data)
liner_result = liner_model.fit()
print(liner_result.rsquared)
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement