How does Tokenizer in tensorflow deal with out of vocabulary tokens if I don’t provide oov_token?

Question

I didn't get any error with that code even though I didn't provide oov_token argument. I expected to get an error in test_tweets = tokenizer.texts_to_sequences(X_test) How does tensorflow deal with out of vocabulary words during the test time when you don't provide the oov_token? Answer OOV words will be ignored / discarded by default, if oov_token is None:

Accepted Answer

OOV words will be ignored / discarded by default, if oov_token is None:import tensorflow as tftokenizer = tf.keras.preprocessing.text.Tokenizer()tokenizer.fit_on_texts(['hello world'])print(tokenizer.word_index)sequences = tokenizer.texts_to_sequences(['hello friends'])print(sequences){'hello': 1, 'world': 2}[[1]]

Advertisement

Answer