Skip to content
Advertisement

Save keras preprocessing layer

I have a model where I’m doing different preprocessing, outside the model itself.

One part of the preprocessing is using a category encoder based on keras with:

cat_index = tf.keras.layers.StringLookup(vocabulary=cat_word_list)
cat_encoder = tf.keras.layers.CategoryEncoding(num_tokens=cat_index.vocabulary_size(), output_mode="one_hot")

I apply this than with

encoded_cat = cat_encoder(cat_index(data['cat_val'])).numpy()
encoded_cat = pd.DataFrame(encoded_cat, columns=['cat_' + str(i) for i in range(len(encoded_cat[0]))]).astype('int64')

data = pd.merge(data, encoded_cat, left_index=True, right_index=True)
data.drop(columns=['cat_val'], inplace=True)

to my pandas dataframe.

Now I want to store my model and in order to store the model I also have to store the 2 preprocessing layers cat_index and cat_encoder. Unfortunately I wasn’t able to figure out how I can store this layers on a file system. If I try to a save function than I get

‘CategoryEncoding’ object has no attribute ‘save’

How can a preprocessing layer like this be stored to a file system so that it can be reused during inference?

One workaround that comes to my mind is to store the cat_word_list and recreate the layers, but I expect there is a more keras based approach.

Advertisement

Answer

Use the get_config layers method to get the configuration:

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

For example:

cat_word_list = ['cat', 'tiger', 'lion', 'dog']
cat_index = tf.keras.layers.StringLookup(vocabulary=cat_word_list)
cat_encoder = tf.keras.layers.CategoryEncoding(num_tokens=cat_index.vocabulary_size(), output_mode="one_hot")

cat_index_config = cat_index.get_config()
cat_encoder_config = cat_encoder.get_config()

This should contain all information needed to recreate the layers:

cat_index_config

Output:

{'name': 'string_lookup',
 'trainable': True,
 'dtype': 'int64',
 'invert': False,
 'max_tokens': None,
 'num_oov_indices': 1,
 'oov_token': '[UNK]',
 'mask_token': None,
 'output_mode': 'int',
 'sparse': False,
 'pad_to_max_tokens': False,
 'vocabulary': ListWrapper(['cat', 'tiger', 'lion', 'dog']),
 'idf_weights': None,
 'encoding': 'utf-8'}

You can recreate the layers like this:

cat_index_2 = tf.keras.layers.StringLookup(**cat_index_config)
cat_encoder_2 = tf.keras.layers.CategoryEncoding(**cat_encoder_config)

Layers have the same configuration, e.g.

cat_index_2.get_config()

Output:

{'name': 'string_lookup',
 'trainable': True,
 'dtype': 'int64',
 'invert': False,
 'max_tokens': None,
 'num_oov_indices': 1,
 'oov_token': '[UNK]',
 'mask_token': None,
 'output_mode': 'int',
 'sparse': False,
 'pad_to_max_tokens': False,
 'vocabulary': ListWrapper(['cat', 'tiger', 'lion', 'dog']),
 'idf_weights': None,
 'encoding': 'utf-8'}
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement