I am trying to train a model, using TPU on Colab, which will take two np.ndarray inputs, one for an image of the shape, (150, 150, 3), and the other for an audio spectrogram image of the shape, (259, 128, 1). Now I have created my dataset using NumPy arrays as follows:- here shape of each is as follows:- I