What is Keras’ Tokenizer fit_on_sequences used for?

Question

I'm familiar with the method 'fit_on_texts' from the Keras' Tokenizer. What does 'fit_on_sequences' do and when is it useful? According to the documentation, it "Updates internal vocabulary based on a list of sequences.", and it takes as input: 'A list of sequence. A "sequence" is a list of integer word indices.'. When is this useful? For fitting on texts, I

Accepted Answer

sequences_to_matrix does work after calling fit_on_sequences, you just need to specify the argument num_words in the Tokenizer() instantiation.from tensorflow.keras.preprocessing.text import Tokenizertest_seq = [[1,2,3,4,5,6]]tok = Tokenizer(num_words=10)tok.fit_on_sequences(test_seq)tok.sequences_to_matrix(test_seq)array([[0., 1., 1., 1., 1., 1., 1., 0., 0., 0.]])The zero at the beginning is there because there is no 0 in your sequence, and the zeroes at the end are because I specified 10 num_words but the highest value in your test sequence in 6.The purpose it serves is just skipping the step of mapping an integer to a string. It only uses the integer.

Advertisement

Answer