tflite: get_tensor on non-output tensors gives random values

Tags: , , ,



I’m trying to debug my tflite model, that uses custom ops. I’ve found the correspondence between op names (in *.pb) and op ids (in *.tflite), and I’m doing a layer-per-layer comparison (to make sure the outputs difference are always in range 1e-4 (since it blows up at the end, I want to find the exact place where my custom layer fails) as follows:


Method 1: I use get_tensor to get the output as follows:

from tensorflow.contrib.lite.python import interpreter

# load the model
model = interpreter.Interpreter(model_path='model.tflite')
model.allocate_tensors()

# get tensors
for i in tensor_ids:
    tensor_output[i] = model.get_tensor(i)

It show totally inadequate random values (comparing to the outputs of the TensorFlow model).


Method 2: convert the *.pb only up to a certain layer, then repeat, basically:

  1. Create a *.pb so that it contains the network only from input up to layer_1.

  2. Convert to tflite (so the output is now layer_1) and check the outputs of TF-Lite with TensorFlow.

  3. Repeat steps 1-2 for layer_2, layer_3, … outputs.

This method requires much more work and executions, but it correctly shows that for built-in operations the outputs of tflite and pb models were identical, and only starts to differ in my custom ops (while in Method 1 the outputs diverges right away from first layers).


Question: Why the behaviour of get_tensor is so strange? Maybe it is because I am using tensorflow 1.9(when TF-Lite was still not released and available only in developer preview)?

PS: I am aware about the release of TF-Lite, but I’ve manually compiled TensorFlow 1.9 for my project and now it is hard to change the versioning.

Answer

I had the same problem few month ago. The thing is, TF-Lite is completely different from TensorFlow – it uses static memory and execution plans, memory mapping files for faster loading, and it is supposed to optimize everything possible in the network’s forward propagation pipeline.

I’m not a developer of TF-Lite, but I suppose it keeps its memory footprint extremely low by re-using the memory areas that were used for previously computed ops. Let’s see the idea on following illustration:


Step 1: first, we’re feeding the inputs to a symbolic tensor I (in parentheses). Let’s say the value of it is stored in a buffer called buffer_1.

     op1       op2       op3
(I) ---->  A  ---->  B  ---->  O
_________________________________
^^^        ^^^^^^^^^^^^       ^^^
input      intermediate    output
tensor     tensors         tensor

Step 2: Now, we need to compute op1 on symbolic tensor I to attain the symbolic tensor A. We compute on buffer_1 and store the value of symbolic tensor A in a buffer called buffer_2.

    [op1]      op2       op3
(I) ----> (A) ---->  B  ---->  O

Step 3: Now, we’re computing op2 on symbolic tensor A to attain the symbolic tensor B. We compute on buffer_2 and store the value of symbolic tensor B in a buffer called buffer_3

     op1      [op2]      op3
 I  ----> (A) ----> (B) ---->  O

But wait! Why waste our memory to store in buffer_3 if we now have buffer_1 that is unused, and the value of which is now useless for getting the output O? So, instead of storing in buffer_3, we will actually store results of this operation in buffer_1!

That’s the basic idea of efficient memory re-usage, which I think is implemented in TF-Lite, given its built-in static graph analyzer in toco and other stuffs. And that’s why you can’t simply apply get_tensor on non-output tensors.


An easier way to debug?

You’ve mentioned that you’re writing a custom op, so I suppose you’ve built tflite with bazel, right? Then you can actually inject some logging code to Interpreter::Invoke() in the file tensorflow/lite/interpreter.cc. An ugly hack, but it works.

PS: I would be glad if any TensorFlow Lite developers come across and give a comment on this :)



Source: stackoverflow