Skip to content
Advertisement

Debugging tensorflow fit not making sense

So I was having different results with a self-implemented code and Tensorflow results. I wanted to test each value to see where was my error (loss, gradients, optimizer, etc).

Therefore I did a test code like the one in this repo inspired on the fashion mnist example. Just for simplicity I will copy-paste it at the end of the question.

Logic:

Basically, I do 1 epoch on 1 batch. And then save:

  1. Weigths before training
  2. Gradients
  3. Weights after only one epoch and batch.

As I use the default SGD TensorFlow algorithm, then the saved gradients should be equal to (initial_weights - final_weights)/0.01. This idea was taken from here.

However, this does not happen, what’s more, results get closer if I divide by 0.0001 instead of 0.01 which is strangely enough 0.01^2.

Is there an error in my logic? testing code? I cannot find it.

PS: I tried using tf version 2.2.0 and 2.4.1 on Linux.


JavaScript

Advertisement

Answer

It looks like the call to fit is averaging the gradient over the batch size. I don’t know if it’s a bug of it is by design.

As you compute your gradient manually anyway, you can just call model.optimizer.apply_gradients to update your weights, you should get the correct results.

JavaScript
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement