I’m trying to debug my
tflite model, that uses custom ops. I’ve found the correspondence between op names (in
*.pb) and op ids (in
*.tflite), and I’m doing a layer-per-layer comparison (to make sure the outputs difference are always in range
1e-4 (since it blows up at the end, I want to find the exact place where my custom layer fails) as follows:
Method 1: I use
get_tensor to get the output as follows:
from tensorflow.contrib.lite.python import interpreter # load the model model = interpreter.Interpreter(model_path='model.tflite') model.allocate_tensors() # get tensors for i in tensor_ids: tensor_output[i] = model.get_tensor(i)
It show totally inadequate random values (comparing to the outputs of the TensorFlow model).
Method 2: convert the
*.pb only up to a certain layer, then repeat, basically:
*.pb so that it contains the network only from
input up to
tflite (so the output is now
layer_1) and check the outputs of TF-Lite with TensorFlow.
Repeat steps 1-2 for
This method requires much more work and executions, but it correctly shows that for built-in operations the outputs of
pb models were identical, and only starts to differ in my custom ops (while in Method 1 the outputs diverges right away from first layers).
Question: Why the behaviour of
get_tensoris so strange? Maybe it is because I am using
tensorflow 1.9(when TF-Lite was still not released and available only in developer preview)?
PS: I am aware about the release of TF-Lite, but I’ve manually compiled TensorFlow 1.9 for my project and now it is hard to change the versioning.
I had the same problem few month ago. The thing is, TF-Lite is completely different from TensorFlow – it uses static memory and execution plans, memory mapping files for faster loading, and it is supposed to optimize everything possible in the network’s forward propagation pipeline.
I’m not a developer of TF-Lite, but I suppose it keeps its memory footprint extremely low by re-using the memory areas that were used for previously computed ops. Let’s see the idea on following illustration:
Step 1: first, we’re feeding the inputs to a symbolic tensor
I (in parentheses). Let’s say the value of it is stored in a buffer called
op1 op2 op3 (I) ----> A ----> B ----> O _________________________________ ^^^ ^^^^^^^^^^^^ ^^^ input intermediate output tensor tensors tensor
Step 2: Now, we need to compute
op1 on symbolic tensor
I to attain the symbolic tensor
A. We compute on
buffer_1 and store the value of symbolic tensor
A in a buffer called
[op1] op2 op3 (I) ----> (A) ----> B ----> O
Step 3: Now, we’re computing
op2 on symbolic tensor
A to attain the symbolic tensor
B. We compute on
buffer_2 and store the value of symbolic tensor
B in a buffer called
op1 [op2] op3 I ----> (A) ----> (B) ----> O
But wait! Why waste our memory to store in
buffer_3 if we now have
buffer_1 that is unused, and the value of which is now useless for getting the output
O? So, instead of storing in
buffer_3, we will actually store results of this operation in
That’s the basic idea of efficient memory re-usage, which I think is implemented in TF-Lite, given its built-in static graph analyzer in
toco and other stuffs. And that’s why you can’t simply apply
get_tensor on non-output tensors.
An easier way to debug?
You’ve mentioned that you’re writing a custom op, so I suppose you’ve built
bazel, right? Then you can actually inject some logging code to
Interpreter::Invoke() in the file
tensorflow/lite/interpreter.cc. An ugly hack, but it works.
PS: I would be glad if any TensorFlow Lite developers come across and give a comment on this :)