I have a model where I’m doing different preprocessing, outside the model itself.
One part of the preprocessing is using a category encoder based on keras with:
cat_index = tf.keras.layers.StringLookup(vocabulary=cat_word_list) cat_encoder = tf.keras.layers.CategoryEncoding(num_tokens=cat_index.vocabulary_size(), output_mode="one_hot")
I apply this than with
encoded_cat = cat_encoder(cat_index(data['cat_val'])).numpy() encoded_cat = pd.DataFrame(encoded_cat, columns=['cat_' + str(i) for i in range(len(encoded_cat[0]))]).astype('int64') data = pd.merge(data, encoded_cat, left_index=True, right_index=True) data.drop(columns=['cat_val'], inplace=True)
to my pandas dataframe.
Now I want to store my model and in order to store the model I also have to store the 2 preprocessing layers cat_index
and cat_encoder
. Unfortunately I wasn’t able to figure out how I can store this layers on a file system. If I try to a save function than I get
‘CategoryEncoding’ object has no attribute ‘save’
How can a preprocessing layer like this be stored to a file system so that it can be reused during inference?
One workaround that comes to my mind is to store the cat_word_list and recreate the layers, but I expect there is a more keras based approach.
Advertisement
Answer
Use the get_config
layers method to get the configuration:
Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
For example:
cat_word_list = ['cat', 'tiger', 'lion', 'dog'] cat_index = tf.keras.layers.StringLookup(vocabulary=cat_word_list) cat_encoder = tf.keras.layers.CategoryEncoding(num_tokens=cat_index.vocabulary_size(), output_mode="one_hot") cat_index_config = cat_index.get_config() cat_encoder_config = cat_encoder.get_config()
This should contain all information needed to recreate the layers:
cat_index_config
Output:
{'name': 'string_lookup', 'trainable': True, 'dtype': 'int64', 'invert': False, 'max_tokens': None, 'num_oov_indices': 1, 'oov_token': '[UNK]', 'mask_token': None, 'output_mode': 'int', 'sparse': False, 'pad_to_max_tokens': False, 'vocabulary': ListWrapper(['cat', 'tiger', 'lion', 'dog']), 'idf_weights': None, 'encoding': 'utf-8'}
You can recreate the layers like this:
cat_index_2 = tf.keras.layers.StringLookup(**cat_index_config) cat_encoder_2 = tf.keras.layers.CategoryEncoding(**cat_encoder_config)
Layers have the same configuration, e.g.
cat_index_2.get_config()
Output:
{'name': 'string_lookup', 'trainable': True, 'dtype': 'int64', 'invert': False, 'max_tokens': None, 'num_oov_indices': 1, 'oov_token': '[UNK]', 'mask_token': None, 'output_mode': 'int', 'sparse': False, 'pad_to_max_tokens': False, 'vocabulary': ListWrapper(['cat', 'tiger', 'lion', 'dog']), 'idf_weights': None, 'encoding': 'utf-8'}