Python

Code:

import numpy as np
import pandas as pd
import statsmodels.api as sm

sacramento = pd.read_csv("sacramento.csv")

X = sacramento[["beds", "sqft", "price"]]
Y = sacramento["baths"]

X = sm.add_constant(X)

model = sm.Logit(Y, X).fit()
predictions = model.predict(X)

print_model = model.summary()
print(print_model)

print(mod.params.round(2))
print(mod.pvalues.round(2))
print('The smallest p-value is for sqft')

JavaScript
​x
 
import numpy as np
import pandas as pd
import statsmodels.api as sm
​
sacramento = pd.read_csv("sacramento.csv")
​
X = sacramento[["beds", "sqft", "price"]]
Y = sacramento["baths"]
​
X = sm.add_constant(X)
​
model = sm.Logit(Y, X).fit()
predictions = model.predict(X)
​
print_model = model.summary()
print(print_model)
​
print(mod.params.round(2))
print(mod.pvalues.round(2))
print('The smallest p-value is for sqft')
​

The problem I have is with the “You will need to create a new variable from baths, and it should make it such that those observations of 1 bath correspond to a value of 0, and those with more than 1 bath correspond to a 1.” instruction.

I really do not know how to do that. I know that it causes a ValueError: endog must be in the unit interval.

Link to the csv file: https://drive.google.com/file/d/1A3LQ2vZ9IUkv_2HkqP8c2sCQGAvdII-r/view?usp=sharing

Answer

Can you try this?

sacramento["baths"] = sacramento["baths"].apply(lambda x: 0 if x== 1 else 1)

JavaScript
 
sacramento["baths"] = sacramento["baths"].apply(lambda x: 0 if x== 1 else 1)
​

Python statsmodels – ValueError: how to create variable in range 0 to 1?

Advertisement

Answer