Code:
import numpy as np import pandas as pd import statsmodels.api as sm sacramento = pd.read_csv("sacramento.csv") X = sacramento[["beds", "sqft", "price"]] Y = sacramento["baths"] X = sm.add_constant(X) model = sm.Logit(Y, X).fit() predictions = model.predict(X) print_model = model.summary() print(print_model) print(mod.params.round(2)) print(mod.pvalues.round(2)) print('The smallest p-value is for sqft')
The problem I have is with the “You will need to create a new variable from baths, and it should make it such that those observations of 1 bath correspond to a value of 0, and those with more than 1 bath correspond to a 1.” instruction.
I really do not know how to do that. I know that it causes a ValueError: endog must be in the unit interval
.
Link to the csv file: https://drive.google.com/file/d/1A3LQ2vZ9IUkv_2HkqP8c2sCQGAvdII-r/view?usp=sharing
Advertisement
Answer
Can you try this?
sacramento["baths"] = sacramento["baths"].apply(lambda x: 0 if x== 1 else 1)