I’m currently training my first neural network on a larger dataset. I have splitted my training data to several .npy binary files, that each contain batches of 20k training samples. I’m loading the data from the npy files, apply some simple pre-processing operations, and then start to train my network by applying the partial_fit
method several times in a loop:
for i in range(50): nnmodel.partial_fit(X_tr,Y_tr)
I read already, that the regular .fit()
method is not capable of training with multiple batches, but partial_fit in contrary should be able to do it.. My first training run goes always well. The loss is decreasing, and I get nice fitting results, so I save my model using the joblib.dump
method.
For the next call I’m using exactly the same script again, that loads my data from the .npy files (doesn’t matter if I feed the same batch, or another one), pre-process it, this time load my pre-trained model with joblib.load
, and start doing the partial_fit
loop again.
What I always get in the second run is a constant loss over all iterations, the error is not decreasing anymore, no matter what dataset I use:
Iteration 51, loss = 3.93268978 Iteration 52, loss = 3.93268978 Iteration 53, loss = 3.93268978 Iteration 54, loss = 3.93268978 ...
What am I doing wrong here? Thanks already!
Advertisement
Answer
There are several possibilities.
- The model may have converged
- There may not be enough passes over the batches (in the example below the model doesn’t converge until ~500 iterations)
- (Need more info) the
joblib.dump
andjoblib.load
may be saving or loading in an unexpected way
Instead of calling a script multiple times and dumping the results between iterations, it might be easier to debug if initializing/preprocessing/training/visualizing all happens in one script. Here is a minimal example:
import matplotlib.pyplot as plt from sklearn.neural_network import MLPRegressor from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split X, y = make_regression(n_samples=10000, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y) regr = MLPRegressor() losses = [] test_performance = [] for _ in range(100): # Make 100 passes over the batches for batch in range(500, 7501, 500): # Perform partial fits on batches of 500 examples # Simulate batches, these could also be loaded from `.npy` X_train_batch = X_train[batch-500:batch] y_train_batch = y_train[batch-500:batch] regr.partial_fit(X_train_batch, y_train_batch) losses.append(regr.loss_) test_performance.append(regr.score(X_test, y_test)) # Plotting results: fig, (ax1, ax2) = plt.subplots(1, 2) ax1.title.set_text("Training Loss") ax2.title.set_text("Score on test set") ax1.plot(range(len(losses)), losses) ax2.plot(range(len(test_performance)), test_performance) plt.show()
Output: