TensorFlow TextVectorization producing Ragged Tensor with no padding after loading it from pickle

Question

I have a TensorFlow TextVectorization layer named "eng_vectorization": and I saved it in a pickle file, using this code: Then I load that pickle file properly as new_eng_vectorization: Now I am expecting, both previous vectorization eng_vectorization and newly loaded vectorization new_eng_vectorization to work the same, but they are not. The output of original vectorization, eng_vectorization(['Hello people']) is a Tensor: And

Accepted Answer

The problem is related to a very recent bug, where the output_mode is not set correctly when it comes from a saved configuration.This works:pickle.dump({'config': eng_vectorization.get_config(), 'weights': eng_vectorization.get_weights()}, open("english_vocab.pkl", "wb"))from_disk = pickle.load(open("english_vocab.pkl", "rb"))new_eng_vectorization = TextVectorization(max_tokens=from_disk['config']['max_tokens'], output_mode='int', output_sequence_length=from_disk['config']['output_sequence_length'])new_eng_vectorization.adapt(tf.data.Dataset.from_tensor_slices(["xyz"]))new_eng_vectorization.set_weights(from_disk['weights'])new_eng_vectorization(['Hello people'])This is currently not working correctly:pickle.dump({'config': eng_vectorization.get_config(), 'weights': eng_vectorization.get_weights()}, open("english_vocab.pkl", "wb"))from_disk = pickle.load(open("english_vocab.pkl", "rb"))new_eng_vectorization = TextVectorization(max_tokens=from_disk['config']['max_tokens'], output_mode=from_disk['config']['output_mode'], output_sequence_length=from_disk['config']['output_sequence_length'])new_eng_vectorization.adapt(tf.data.Dataset.from_tensor_slices(["xyz"]))new_eng_vectorization.set_weights(from_disk['weights'])new_eng_vectorization(['Hello people'])Even though both 'int' and from_disk['config']['output_mode'] are equal and of the same data type. Anyway, you can use the workaround for now.

Advertisement

Answer