I am trying to train a model, using TPU on Colab, which will take two np.ndarray
inputs, one for an image of the shape, (150, 150, 3), and the other for an audio spectrogram image of the shape, (259, 128, 1). Now I have created my dataset using NumPy arrays as follows:-
trainX = [train_image_array, train_spect_array] trainY = labels_array
here shape of each is as follows:-
train_image_array.shape = (86802, 150, 150, 3) train_spect_array.shape = (86802, 259, 128, 1) labels_array.shape = (86802,)
I also have a similar dataset for testing too, with instead 86K samples, it has 9K samples. So, when I try to evaluate my model on testing data, it works, but when I try to train or evaluate my model on training data, it shows:-
<ipython-input-20-9240f9fc84df> in runModel(model, trainX, trainY, testX, testY, patience, resetWeights, checkpointPath, epochs, save_checkpoint, batch_size, generator, save_weights, save_weights_path, metrics) 76 model.evaluate(testX, testY, batch_size=batch_size) 77 # model.evaluate(trainX, trainY, batch_size=batch_size) ---> 78 history = model.fit(trainX, trainY, epochs=epochs, batch_size=batch_size, validation_data=(testX, testY), shuffle=True, callbacks=callbacks) 79 # model.evaluate(trainX, trainY, batch_size=batch_size) 80 model.evaluate(testX, testY, batch_size=batch_size) /usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs) 65 except Exception as e: # pylint: disable=broad-except 66 filtered_tb = _process_traceback_frames(e.__traceback__) ---> 67 raise e.with_traceback(filtered_tb) from None 68 finally: 69 del filtered_tb /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/constant_op.py in convert_to_eager_tensor(value, ctx, dtype) 100 dtype = dtypes.as_dtype(dtype).as_datatype_enum 101 ctx.ensure_initialized() --> 102 return ops.EagerTensor(value, ctx.device_name, dtype) 103 104
Here, runModel(...)
is my function, which just consists of model.evaluate
, model.fit
, plotting of graphs etc., main problem is at model.fit(trainX, trainY .. )
.
Same error arises on model.evaluate(trainX, trainY, .. )
. I thought it might be only on model.evaluate
, so I commented it, but I was wrong 🤦🏻♂️.
Can anyone help me?
Advertisement
Answer
The only solution I found for this problem was that, in the case of very large datasets, we should create .tfrecord files and use the TensorFlow dataset with them. Also when using TPU with them, we will need to save our .tfrecord files in Google Cloud Storage, using a bucket.