I’m currently training my first neural network on a larger dataset. I have splitted my training data to several .npy binary files, that each contain batches of 20k training samples. I’m loading the data from the npy files, apply some simple pre-processing operations, and then start to train my network by applying the partial_fit
method several times in a loop:
for i in range(50):
nnmodel.partial_fit(X_tr,Y_tr)
I read already, that the regular .fit()
method is not capable of training with multiple batches, but partial_fit in contrary should be able to do it.. My first training run goes always well. The loss is decreasing, and I get nice fitting results, so I save my model using the joblib.dump
method.
For the next call I’m using exactly the same script again, that loads my data from the .npy files (doesn’t matter if I feed the same batch, or another one), pre-process it, this time load my pre-trained model with joblib.load
, and start doing the partial_fit
loop again.
What I always get in the second run is a constant loss over all iterations, the error is not decreasing anymore, no matter what dataset I use:
Iteration 51, loss = 3.93268978
Iteration 52, loss = 3.93268978
Iteration 53, loss = 3.93268978
Iteration 54, loss = 3.93268978
What am I doing wrong here? Thanks already!
Advertisement
Answer
There are several possibilities.
- The model may have converged
- There may not be enough passes over the batches (in the example below the model doesn’t converge until ~500 iterations)
- (Need more info) the
joblib.dump
andjoblib.load
may be saving or loading in an unexpected way
Instead of calling a script multiple times and dumping the results between iterations, it might be easier to debug if initializing/preprocessing/training/visualizing all happens in one script. Here is a minimal example:
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPRegressor
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
X, y = make_regression(n_samples=10000, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y)
regr = MLPRegressor()
losses = []
test_performance = []
for _ in range(100):
# Make 100 passes over the batches
for batch in range(500, 7501, 500):
# Perform partial fits on batches of 500 examples
# Simulate batches, these could also be loaded from `.npy`
X_train_batch = X_train[batch-500:batch]
y_train_batch = y_train[batch-500:batch]
regr.partial_fit(X_train_batch, y_train_batch)
losses.append(regr.loss_)
test_performance.append(regr.score(X_test, y_test))
# Plotting results:
fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.title.set_text("Training Loss")
ax2.title.set_text("Score on test set")
ax1.plot(range(len(losses)), losses)
ax2.plot(range(len(test_performance)), test_performance)
plt.show()
Output: