I have a simple dataset which looks like this:
v1 v2 v3 hour_day sales 3 4 24 12 133 5 5 13 12 243 4 9 3 3 93 5 12 5 3 101 4 9 3 6 93 5 12 5 6 101
I created a simple LR model to train and predict the target variable “sales”. And I used MAE to evaluate the model
# Define the input and target features X= df.iloc[:,[0,1, 2, 3]] y = df.iloc[:, 4] # Split the data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) # Train and fit the model regressor = LinearRegression() regressor.fit(X_train, y_train) # Make prediction y_pred = regressor.predict(X_test) # Evaluate the model print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
My code works well, but what I want to do is to predict the sales in the X_test grouped by hour of the day. In the above dataset example, there is three hours slots, 12, 3, and 6. So the output should look like this:
MAE for hour 12: 18.29 MAE for hour 3: 11.67 MAE for hour 6: 14.43
I think I should use for loop to iterate. It could be something like this:
# Save Hour Vector hour_vec = deepcopy(X_test['hour_day']) for i in range(len(X_test)): y_pred = regressor.predict(np.array([X_test[i]])
So any idea how to perform it?
Advertisement
Answer
hours = list(set(X_test['hour_day'])) results = pd.DataFrame(index=['MAE'], columns=hours) for hour in hours: idx = X_test['hour_day'] == hour y_pred_h = regressor.predict(X_test[idx]) mae = metrics.mean_absolute_error(y_test[idx], y_pred_h) results.loc['MAE', hour] = mae results.loc['MAE', 'mean'] = results.mean(axis=1)[0] print(results)
prints
3 6 mean MAE 71.405775 71.405775 71.405775