For reference the full error is here:
WARNING:tensorflow:Model was constructed with shape (None, 65536) for input KerasTensor(type_spec=TensorSpec(shape=(None, 65536), dtype=tf.float32, name='input_1'), name='input_1', description="created by layer 'input_1'"), but it was called on an input with incompatible shape (None, 65536, None).
I am using kymatio
to classify audio signals. Before constructing the model I use tensorflow’s tf.keras.utils.audio_dataset_from_directory
to create the training and testing sets.
The audio samples are of shape (65536,)
before the sets are created. To create the sets I use the following code:
T = 2**16 J = 8 Q = 12 log_eps = 1e-6 SEED = 42 train_dataset = tf.keras.utils.audio_dataset_from_directory( '../train', labels='inferred', label_mode='int', class_names=['x', 'y', 'z', 'xy', 'xz', 'yz', 'xyz'], batch_size=32, output_sequence_length=T, ragged=False, shuffle=True, seed=SEED, follow_links=False )
The element_spec
of the train_dataset
is (TensorSpec(shape=(None, 65536, None), dtype=tf.float32, name=None), TensorSpec(shape=(None,), dtype=tf.int32, name=None))
.
So at some point the shape is changing in the TensorSpec
to (None, 65536, None)
for some reason…
The model is constructed as follows and the error points to model.fit(...)
.
x_in = layers.Input(shape=(T)) x = Scattering1D(J, Q=Q)(x_in) x = layers.Lambda(lambda x: x[..., 1:, :])(x) x = layers.Lambda(lambda x: tf.math.log(tf.abs(x) + log_eps))(x) x = layers.GlobalAveragePooling1D(data_format='channels_first')(x) x = layers.BatchNormalization(axis=1)(x) x_out = layers.Dense(7, activation='softmax')(x) model = tf.keras.models.Model(x_in, x_out) model.summary() model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(train_dataset, epochs=50)
Advertisement
Answer
Check the docs regarding tf.keras.utils.audio_dataset_from_directory
:
[…] audio has shape (batch_size, sequence_length, num_channels)
Just use tf.squeeze
to remove the additional dimension if you are only working on single channel audios:
train_dataset = train_dataset.map(lambda x, y: (tf.squeeze(x, axis=-1), y))
If you want to keep the dimension, try:
x_in = layers.Input(shape=(T, 1))
I would recommend going through this tutorial.