Merging results from model.predict() with original pandas DataFrame?

Question

I am trying to merge the results of a predict method back with the original data in a pandas.DataFrame object. To merge these predictions back with the original df, I try this: But that raises: ValueError: Length of values does not match length of index I know I could split the df into train_df and test_df an…

Accepted Answer

your y_hats length will only be the length on the test data (20%) because you predicted on X_test. Once your model is validated and you&#8217;re happy with the test predictions (by examining the accuracy of your model on the X_test predictions compared to the X_test true values), you should rerun the predict on the full dataset (X). Add these two lines to the bottom:y_hats2 = model.predict(X)df['y_hats'] = y_hats2EDIT per your comment, here is an updated result the returns the dataset with the prediction appended where they were in the test datsetfrom sklearn.datasets import load_irisfrom sklearn.cross_validation import train_test_splitfrom sklearn.tree import DecisionTreeClassifierimport pandas as pdimport numpy as npdata = load_iris()# bear with me for the next few steps... I'm trying to walk you through# how my data object landscape looks... i.e. how I get from raw data # to matrices with the actual data I have, not the iris dataset# put feature matrix into columnar format in dataframedf = pd.DataFrame(data = data.data)# add outcome variabledf_class = pd.DataFrame(data = data.target)# finally, split into train-testX_train, X_test, y_train, y_test = train_test_split(df,df_class, train_size = 0.8)model = DecisionTreeClassifier()model.fit(X_train, y_train)# I've got my predictions nowy_hats = model.predict(X_test)y_test['preds'] = y_hatsdf_out = pd.merge(df,y_test[['preds']],how = 'left',left_index = True, right_index = True)

Advertisement

Answer