Tensorflow error: Failed to serialize message. For multi-modal dataset

I am trying to train a model, using TPU on Colab, which will take two np.ndarray inputs, one for an image of the shape, (150, 150, 3), and the other for an audio spectrogram image of the shape, (259, 128, 1). Now I have created my dataset using NumPy arrays as follows:-

trainX = [train_image_array, train_spect_array]
trainY = labels_array

JavaScript
​x
 
trainX = [train_image_array, train_spect_array]
trainY = labels_array
​

here shape of each is as follows:-

train_image_array.shape = (86802, 150, 150, 3)
train_spect_array.shape = (86802, 259, 128, 1)
labels_array.shape = (86802,)

JavaScript
 
train_image_array.shape = (86802, 150, 150, 3)
train_spect_array.shape = (86802, 259, 128, 1)
labels_array.shape = (86802,)
​

I also have a similar dataset for testing too, with instead 86K samples, it has 9K samples. So, when I try to evaluate my model on testing data, it works, but when I try to train or evaluate my model on training data, it shows:-

<ipython-input-20-9240f9fc84df> in runModel(model, trainX, trainY, testX, testY, patience, resetWeights, checkpointPath, epochs, save_checkpoint, batch_size, generator, save_weights, save_weights_path, metrics)
     76         model.evaluate(testX, testY, batch_size=batch_size)
     77         # model.evaluate(trainX, trainY, batch_size=batch_size)
---> 78         history = model.fit(trainX, trainY, epochs=epochs, batch_size=batch_size, validation_data=(testX, testY), shuffle=True, callbacks=callbacks)
     79         # model.evaluate(trainX, trainY, batch_size=batch_size)
     80         model.evaluate(testX, testY, batch_size=batch_size)

/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
     65     except Exception as e:  # pylint: disable=broad-except
     66       filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67       raise e.with_traceback(filtered_tb) from None
     68     finally:
     69       del filtered_tb

/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
    100       dtype = dtypes.as_dtype(dtype).as_datatype_enum
    101   ctx.ensure_initialized()
--> 102   return ops.EagerTensor(value, ctx.device_name, dtype)
    103 
    104

JavaScript
 
<ipython-input-20-9240f9fc84df> in runModel(model, trainX, trainY, testX, testY, patience, resetWeights, checkpointPath, epochs, save_checkpoint, batch_size, generator, save_weights, save_weights_path, metrics)
     76         model.evaluate(testX, testY, batch_size=batch_size)
     77         # model.evaluate(trainX, trainY, batch_size=batch_size)
---> 78         history = model.fit(trainX, trainY, epochs=epochs, batch_size=batch_size, validation_data=(testX, testY), shuffle=True, callbacks=callbacks)
     79         # model.evaluate(trainX, trainY, batch_size=batch_size)
     80         model.evaluate(testX, testY, batch_size=batch_size)
​
/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
     65     except Exception as e:  # pylint: disable=broad-except
     66       filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67       raise e.with_traceback(filtered_tb) from None
     68     finally:
     69       del filtered_tb
​
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
    100       dtype = dtypes.as_dtype(dtype).as_datatype_enum
    101   ctx.ensure_initialized()
--> 102   return ops.EagerTensor(value, ctx.device_name, dtype)
    103 
    104
​

Here, runModel(...) is my function, which just consists of model.evaluate, model.fit, plotting of graphs etc., main problem is at model.fit(trainX, trainY .. ). Same error arises on model.evaluate(trainX, trainY, .. ). I thought it might be only on model.evaluate, so I commented it, but I was wrong 🤦🏻‍♂️. Can anyone help me?

Answer

The only solution I found for this problem was that, in the case of very large datasets, we should create .tfrecord files and use the TensorFlow dataset with them. Also when using TPU with them, we will need to save our .tfrecord files in Google Cloud Storage, using a bucket.

Advertisement

Answer