Skip to content
Advertisement

Merging results from model.predict() with original pandas DataFrame?

I am trying to merge the results of a predict method back with the original data in a pandas.DataFrame object.

JavaScript

To merge these predictions back with the original df, I try this:

JavaScript

But that raises:

ValueError: Length of values does not match length of index

I know I could split the df into train_df and test_df and this problem would be solved, but in reality I need to follow the path above to create the matrices X and y (my actual problem is a text classification problem in which I normalize the entire feature matrix before splitting into train and test). How can I align these predicted values with the appropriate rows in my df, since the y_hats array is zero-indexed and seemingly all information about which rows were included in the X_test and y_test is lost? Or will I be relegated to splitting dataframes into train-test first, and then building feature matrices? I’d like to just fill the rows included in train with np.nan values in the dataframe.

Advertisement

Answer

your y_hats length will only be the length on the test data (20%) because you predicted on X_test. Once your model is validated and you’re happy with the test predictions (by examining the accuracy of your model on the X_test predictions compared to the X_test true values), you should rerun the predict on the full dataset (X). Add these two lines to the bottom:

JavaScript

EDIT per your comment, here is an updated result the returns the dataset with the prediction appended where they were in the test datset

JavaScript
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement