So I was having different results with a self-implemented code and Tensorflow results. I wanted to test each value to see where was my error (loss, gradients, optimizer, etc).
Therefore I did a test code like the one in this repo inspired on the fashion mnist example. Just for simplicity I will copy-paste it at the end of the question.
Logic:
Basically, I do 1 epoch on 1 batch. And then save:
- Weigths before training
- Gradients
- Weights after only one epoch and batch.
As I use the default SGD TensorFlow algorithm, then the saved gradients should be equal to (initial_weights - final_weights)/0.01
. This idea was taken from here.
However, this does not happen, what’s more, results get closer if I divide by 0.0001 instead of 0.01 which is strangely enough 0.01^2.
Is there an error in my logic? testing code? I cannot find it.
PS: I tried using tf version 2.2.0 and 2.4.1 on Linux.
import tensorflow as tf import numpy as np from pdb import set_trace def get_dataset(): fashion_mnist = tf.keras.datasets.fashion_mnist (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data() return (train_images, train_labels), (test_images, test_labels) def get_model(init1='glorot_uniform', init2='glorot_uniform'): tf.random.set_seed(1) model = tf.keras.Sequential([ tf.keras.layers.Flatten(input_shape=(28, 28)), tf.keras.layers.Dense(128, activation='relu', kernel_initializer=init1), tf.keras.layers.Dense(10, kernel_initializer=init2) ]) model.compile(optimizer='sgd', loss=tf.keras.losses.CategoricalCrossentropy(from_logits=False), metrics=['accuracy']) return model def train(model, x_fit, y_fit): np.save("initial_weights.npy", np.array(model.get_weights())) with tf.GradientTape() as g: y_pred = model(x_fit) loss = tf.keras.losses.categorical_crossentropy(y_pred=y_pred, y_true=y_fit) np.save("loss.npy", np.array(loss)) gradients = g.gradient(loss, model.trainable_weights) np.save("gradients.npy", np.array(gradients)) model.fit(x_fit, y_fit, epochs=1, batch_size=100) np.save("final_weights.npy", np.array(model.get_weights())) if __name__ == "__main__": (train_images, train_labels), (test_images, test_labels) = get_dataset() model = get_model() y_fit = np.zeros((100, 10)) for i, val in enumerate(train_labels[:100]): y_fit[i][val] = 1. train(model, train_images[:100], y_fit) results = { "loss": np.load("loss.npy", allow_pickle=True), "init_weights": np.load("initial_weights.npy", allow_pickle=True), "gradients": np.load("gradients.npy", allow_pickle=True), "final_weights": np.load("final_weights.npy", allow_pickle=True) } for i_w, f_w, gr in zip(results["init_weights"], results["final_weights"], results["gradients"]): gr = gr.numpy() print(np.allclose(gr, (i_w - f_w) / 0.01)) # set_trace()
Advertisement
Answer
It looks like the call to fit
is averaging the gradient over the batch size. I don’t know if it’s a bug of it is by design.
As you compute your gradient manually anyway, you can just call model.optimizer.apply_gradients
to update your weights, you should get the correct results.
def train(model, x_fit, y_fit): np.save("initial_weights.npy", np.array(model.get_weights())) with tf.GradientTape() as g: y_pred = model(x_fit) loss = tf.keras.losses.categorical_crossentropy(y_pred=y_pred, y_true=y_fit) np.save("loss.npy", np.array(loss)) gradients = g.gradient(loss, model.trainable_weights) np.save("gradients.npy", np.array(gradients)) model.optimizer.apply_gradients(zip(gradients, model.trainable_weights)) np.save("final_weights.npy", np.array(model.get_weights()))