I’ve created the following model:
import tensorflow as tf import numpy as np from tensorflow.keras.layers import Input, Embedding, Concatenate, Dense, LSTM from tensorflow.keras.models import Model import keras def get_model(): inputs = Input( shape =(None,2), name='timeseries_input',ragged=True ) lstm = LSTM(100, activation='tanh')(inputs.to_tensor(), mask=tf.sequence_mask(inputs.row_lengths())) dense1 = Dense(10, name='dense1')(lstm) out1 = Dense(1)(dense1) dense2 = Dense(10, name='dense2')(lstm) out2 = Dense(1)(dense2) dense3 = Dense(10, name='dense3')(lstm) out3 = Dense(1)(dense3) model = Model(inputs=inputs, outputs=[out1,out2,out3]) model.summary() return model m = get_model() m.compile(loss=['mse','mse','mse'], optimizer='adam', metrics=['mse']) tf.keras.utils.plot_model( m, show_shapes=True, show_dtype=True, show_layer_names=True, rankdir="TB", )
and the following dummy data:
import pandas as pd def get_ragged_constants(data): return tf.RaggedTensor.from_row_lengths( values=data[['f1','f2']].values, row_lengths=data.groupby('grp').size()) data = pd.DataFrame({'f1':[1,2,3,4,11,22,33,6,7,8,9,8,66,55,88,99],'f2':[4,3,2,1,44,33,22,66,55,44,33,22,1,2,3,4],'grp':[1,1,1,1,2,2,2,3,3,3,3,3,4,4,4,4]}) targets = pd.DataFrame({'t1':[1,2,1,2],'t2':[1,2,1,2],'t3':[1,2,1,2]}) x = get_ragged_constants(data) y = targets.values
with the shapes of (4, None, 2)
and (4, 3)
.
Looking at the model structure one can see that the model has 3 outputs of shape (None, 1)
.
I was wondering how come the fit works, when I expected they to be of shape (4, 3, 1)
and not (4, 3)
.
m.fit(x,y,epochs=3) Epoch 1/3 1/1 [==============================] - 4s 4s/step - loss: 7.0158 - dense_21_loss: 1.8125 - dense_22_loss: 2.3360 - dense_23_loss: 2.8673 - dense_21_mse: 1.8125 - dense_22_mse: 2.3360 - dense_23_mse: 2.8673 Epoch 2/3 1/1 [==============================] - 0s 11ms/step - loss: 5.6303 - dense_21_loss: 1.2898 - dense_22_loss: 2.0406 - dense_23_loss: 2.2999 - dense_21_mse: 1.2898 - dense_22_mse: 2.0406 - dense_23_mse: 2.2999 Epoch 3/3 1/1 [==============================] - 0s 8ms/step - loss: 4.4403 - dense_21_loss: 0.8691 - dense_22_loss: 1.7483 - dense_23_loss: 1.8228 - dense_21_mse: 0.8691 - dense_22_mse: 1.7483 - dense_23_mse: 1.8228 <tensorflow.python.keras.callbacks.History at 0x7fda43d41b90>
So I added one output to the target and tested the same model with a y of shape (4, 4)
and the fit works …. I’m lost.
Question: How should I shape my y to fit the model and what actually happened when I gave it the wrong y shape?
Advertisement
Answer
Both are correct. Take a look at this and this. As you can see that this says ‘Squeeze or expand last dimension if needed’ and so after doing that if the dimensions match then it’s all good.
First of all remember that everything depends on your loss function. Below I will show one example:
# I get a preds from the model at output x. preds = m(x) # Let's print the shapes preds = np.array(preds) x.shape, y.shape, y2.shape, preds.shape # Result -> (TensorShape([4, None, 2]), (4, 3), (4, 5), (3, 4, 1)) # Let's take an individual look at the y2[0] and preds[0] y2[0], preds[0] ''' (array([1, 1, 1, 1, 1]), array([[-0.1815457 ], [-1.0390669 ], [ 0.27160883], [-0.3232715 ]], dtype=float32)) So, now the thing to notice here is what will happen if we do y2[0] - preds[0]? As the shapes are different the arrays will first be broadcasted and the y2[0] will become : [[1,1,1,1,1] [1,1,1,1,1] [1,1,1,1,1] [1,1,1,1,1]] and preds[0] will become: array([[-0.1815457 , -0.1815457 , -0.1815457 , -0.1815457 , -0.1815457 ], [-1.03906691, -1.03906691, -1.03906691, -1.03906691, -1.03906691], [ 0.27160883, 0.27160883, 0.27160883, 0.27160883, 0.27160883], [-0.32327151, -0.32327151, -0.32327151, -0.32327151, -0.32327151]]) ''' # Doing y2[0] - preds[0] y2[0] - preds[0] ''' due to above mentioned broadcasting the results of this will be array([[1.1815457 , 1.1815457 , 1.1815457 , 1.1815457 , 1.1815457 ], [2.03906691, 2.03906691, 2.03906691, 2.03906691, 2.03906691], [0.72839117, 0.72839117, 0.72839117, 0.72839117, 0.72839117], [1.32327151, 1.32327151, 1.32327151, 1.32327151, 1.32327151]]) ''' # Now we take the mean np.mean(y2[0] - preds[0]) # Result -> 1.3180688247084618 # After doing the whole process with the whole y2 and preds temp = y2 - preds np.mean(temp) # result -> 1.9192037958030899 # So that was the case of y2. now let's see the case of y1 # speeding things up, if I were to do y[0] - preds[0] y[0]-preds[0] ''' The results will be: array([[1.1815457 , 1.1815457 , 1.1815457 ], [2.03906691, 2.03906691, 2.03906691], [0.72839117, 0.72839117, 0.72839117], [1.32327151, 1.32327151, 1.32327151]]) Can you see the answer? Well now as soon as take the mean the results will be equal to y2. ''' np.mean(y[0] - preds[0]) # Results -> 1.3180688247084618
And hence both are working fine in this case.