I am trying to match the accuracy of a model.predict call to the final val_accuracy of model.fit(). I am using tf dataset.
val_ds = tf.keras.utils.image_dataset_from_directory( 'my_path', validation_split=0.2, subset="validation", seed=38, image_size=(SIZE,SIZE), )
The dataset setup for train_ds is similar. I prefetch both…
train_ds = train_ds.prefetch(buffer_size=AUTOTUNE) val_ds = val_ds.prefetch(buffer_size=AUTOTUNE)
Than I get the labels for the val_ds so I can use them later
true_categories = tf.concat([y for x, y in val_ds], axis=0)
My model
inputs = tf.keras.Input(shape=(SIZE, SIZE, 3)) # ... some other layers outputs = tf.keras.layers.Dense( len(CLASS_NAMES), activation = tf.keras.activations.softmax)(intermediate) model = tf.keras.Model(inputs, outputs)
Compiles fine
model.compile( optimizer = 'adam', loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics = ['accuracy'])
Seems to fit fine
history = model.fit( train_ds, validation_data=val_ds, epochs=10, class_weight=class_weights) #i do weight the classes due to imbalance
The last epoch output
Epoch 10: val_accuracy did not improve from 0.92291 176/176 [==============================] – 191s 1s/step – loss: 0.9876 – accuracy: 0.7318 – val_loss: 0.4650 – val_accuracy: 0.8580
Now I want to verify the val_accuracy == 0.8580 when I run model.predict()
predictions = model.predict(val_ds, verbose=2 ) flattened_predictions = predictions.argmax(axis=1) accuracy = metrics.accuracy_score(true_categories, flattened_predictions) print ("Accuracy = ", accuracy)
Accuracy = 0.7980014275517487
I would have expected that to equal the last val accuracy, which was 0.8580, but it is off. My val_ds uses a seed so I should be getting the images in the same order when I shuffle, right? Getting ground truth labels is a pain using datasets, but I think (???) my method is correct.
I only have two classes and when I look at my predictions variable it looks like I am getting probabilities as I would expect, so I think I set up, compiled and fit my model correctly for sparse categorical cross entropy using softmax on my final layer output.
predictions[:3] #show the first 3 predictions, the values sum to 1.0 as expected
array([[0.42447385, 0.5755262 ], [0.2162129 , 0.7837871 ], [0.31917858, 0.6808214 ]], dtype=float32)
What am I missing?
Advertisement
Answer
What you are missing is that your validation dataset is shuffled at every iteration.
tf.keras.utils.image_dataset_from_directory
has shuffle=True
by default. And that shuffle
method for a TensorFlow dataset has an argument reshuffle_each_iteration
which is None
by default. Therefore it is shuffled everytime.
The seed=38
parameter is used for tracking the samples that reserved for training and validation separately. In other words, with seed
argument we can follow which samples will be used for validation dataset and vice versa.
As an example:
dataset = tf.data.Dataset.range(6) dataset = dataset.shuffle(6, reshuffle_each_iteration=None, seed=154).batch(2) print("First time iteration:") for x in dataset: print(x) print("n") print("Second time iteration") for x in dataset: print(x)
This will print:
First time iteration: tf.Tensor([2 1], shape=(2,), dtype=int64) tf.Tensor([3 0], shape=(2,), dtype=int64) tf.Tensor([5 4], shape=(2,), dtype=int64) Second time iteration tf.Tensor([4 3], shape=(2,), dtype=int64) tf.Tensor([0 5], shape=(2,), dtype=int64) tf.Tensor([2 1], shape=(2,), dtype=int64)
Relevant source code for tf.keras.utils.image_dataset_from_directory
can be found here.
If you want to match predictions with their respective labels, then you can loop over the dataset:
predictions = [] labels = [] for x, y in val_ds: predictions.append(np.argmax(model(x), axis=-1)) labels.append(y.numpy()) predictions = np.concatenate(predictions, axis=0) labels = np.concatenate(labels, axis=0)
Then you can check accuracy.