I have a simple dataset which looks like this:
JavaScript
x
8
1
v1 v2 v3 hour_day sales
2
3 4 24 12 133
3
5 5 13 12 243
4
4 9 3 3 93
5
5 12 5 3 101
6
4 9 3 6 93
7
5 12 5 6 101
8
I created a simple LR model to train and predict the target variable “sales”. And I used MAE to evaluate the model
JavaScript
1
18
18
1
# Define the input and target features
2
X= df.iloc[:,[0,1, 2, 3]]
3
y = df.iloc[:, 4]
4
5
# Split the data
6
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
7
8
9
# Train and fit the model
10
regressor = LinearRegression()
11
regressor.fit(X_train, y_train)
12
13
# Make prediction
14
y_pred = regressor.predict(X_test)
15
16
# Evaluate the model
17
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
18
My code works well, but what I want to do is to predict the sales in the X_test grouped by hour of the day. In the above dataset example, there is three hours slots, 12, 3, and 6. So the output should look like this:
JavaScript
1
4
1
MAE for hour 12: 18.29
2
MAE for hour 3: 11.67
3
MAE for hour 6: 14.43
4
I think I should use for loop to iterate. It could be something like this:
JavaScript
1
6
1
# Save Hour Vector
2
hour_vec = deepcopy(X_test['hour_day'])
3
4
for i in range(len(X_test)):
5
y_pred = regressor.predict(np.array([X_test[i]])
6
So any idea how to perform it?
Advertisement
Answer
JavaScript
1
10
10
1
hours = list(set(X_test['hour_day']))
2
results = pd.DataFrame(index=['MAE'], columns=hours)
3
for hour in hours:
4
idx = X_test['hour_day'] == hour
5
y_pred_h = regressor.predict(X_test[idx])
6
mae = metrics.mean_absolute_error(y_test[idx], y_pred_h)
7
results.loc['MAE', hour] = mae
8
results.loc['MAE', 'mean'] = results.mean(axis=1)[0]
9
print(results)
10
prints
JavaScript
1
3
1
3 6 mean
2
MAE 71.405775 71.405775 71.405775
3