Keras model fits on data with the wrong shape

I’ve created the following model:

import tensorflow as tf
import numpy as np
from tensorflow.keras.layers import Input, Embedding, Concatenate, Dense, LSTM
from tensorflow.keras.models import Model
import keras

def get_model():
  inputs = Input( shape =(None,2), name='timeseries_input',ragged=True )
  lstm = LSTM(100, activation='tanh')(inputs.to_tensor(), mask=tf.sequence_mask(inputs.row_lengths()))
  
  dense1 = Dense(10, name='dense1')(lstm)
  out1 = Dense(1)(dense1)
  dense2 = Dense(10, name='dense2')(lstm)
  out2 = Dense(1)(dense2)
  dense3 = Dense(10, name='dense3')(lstm)
  out3 = Dense(1)(dense3)

  model = Model(inputs=inputs, outputs=[out1,out2,out3])
  model.summary()
  return model


m = get_model()
m.compile(loss=['mse','mse','mse'], optimizer='adam', metrics=['mse'])
tf.keras.utils.plot_model(
    m,
    show_shapes=True,
    show_dtype=True,
    show_layer_names=True,
    rankdir="TB",
)

JavaScript
​x
 
import tensorflow as tf
import numpy as np
from tensorflow.keras.layers import Input, Embedding, Concatenate, Dense, LSTM
from tensorflow.keras.models import Model
import keras
​
def get_model():
  inputs = Input( shape =(None,2), name='timeseries_input',ragged=True )
  lstm = LSTM(100, activation='tanh')(inputs.to_tensor(), mask=tf.sequence_mask(inputs.row_lengths()))
  
  dense1 = Dense(10, name='dense1')(lstm)
  out1 = Dense(1)(dense1)
  dense2 = Dense(10, name='dense2')(lstm)
  out2 = Dense(1)(dense2)
  dense3 = Dense(10, name='dense3')(lstm)
  out3 = Dense(1)(dense3)
​
  model = Model(inputs=inputs, outputs=[out1,out2,out3])
  model.summary()
  return model
​
​
m = get_model()
m.compile(loss=['mse','mse','mse'], optimizer='adam', metrics=['mse'])
tf.keras.utils.plot_model(
    m,
    show_shapes=True,
    show_dtype=True,
    show_layer_names=True,
    rankdir="TB",
)
​

and the following dummy data:

import pandas as pd

def get_ragged_constants(data):
    return tf.RaggedTensor.from_row_lengths(
        values=data[['f1','f2']].values,
        row_lengths=data.groupby('grp').size())
    
data = pd.DataFrame({'f1':[1,2,3,4,11,22,33,6,7,8,9,8,66,55,88,99],'f2':[4,3,2,1,44,33,22,66,55,44,33,22,1,2,3,4],'grp':[1,1,1,1,2,2,2,3,3,3,3,3,4,4,4,4]})
targets = pd.DataFrame({'t1':[1,2,1,2],'t2':[1,2,1,2],'t3':[1,2,1,2]})

x = get_ragged_constants(data)
y = targets.values

JavaScript
 
import pandas as pd
​
def get_ragged_constants(data):
    return tf.RaggedTensor.from_row_lengths(
        values=data[['f1','f2']].values,
        row_lengths=data.groupby('grp').size())
    
data = pd.DataFrame({'f1':[1,2,3,4,11,22,33,6,7,8,9,8,66,55,88,99],'f2':[4,3,2,1,44,33,22,66,55,44,33,22,1,2,3,4],'grp':[1,1,1,1,2,2,2,3,3,3,3,3,4,4,4,4]})
targets = pd.DataFrame({'t1':[1,2,1,2],'t2':[1,2,1,2],'t3':[1,2,1,2]})
​
x = get_ragged_constants(data)
y = targets.values
​

with the shapes of (4, None, 2) and (4, 3).
Looking at the model structure one can see that the model has 3 outputs of shape (None, 1).

I was wondering how come the fit works, when I expected they to be of shape (4, 3, 1) and not (4, 3).

m.fit(x,y,epochs=3)
Epoch 1/3
1/1 [==============================] - 4s 4s/step - loss: 7.0158 - dense_21_loss: 1.8125 - dense_22_loss: 2.3360 - dense_23_loss: 2.8673 - dense_21_mse: 1.8125 - dense_22_mse: 2.3360 - dense_23_mse: 2.8673
Epoch 2/3
1/1 [==============================] - 0s 11ms/step - loss: 5.6303 - dense_21_loss: 1.2898 - dense_22_loss: 2.0406 - dense_23_loss: 2.2999 - dense_21_mse: 1.2898 - dense_22_mse: 2.0406 - dense_23_mse: 2.2999
Epoch 3/3
1/1 [==============================] - 0s 8ms/step - loss: 4.4403 - dense_21_loss: 0.8691 - dense_22_loss: 1.7483 - dense_23_loss: 1.8228 - dense_21_mse: 0.8691 - dense_22_mse: 1.7483 - dense_23_mse: 1.8228
<tensorflow.python.keras.callbacks.History at 0x7fda43d41b90>

JavaScript
 
m.fit(x,y,epochs=3)
Epoch 1/3
1/1 [==============================] - 4s 4s/step - loss: 7.0158 - dense_21_loss: 1.8125 - dense_22_loss: 2.3360 - dense_23_loss: 2.8673 - dense_21_mse: 1.8125 - dense_22_mse: 2.3360 - dense_23_mse: 2.8673
Epoch 2/3
1/1 [==============================] - 0s 11ms/step - loss: 5.6303 - dense_21_loss: 1.2898 - dense_22_loss: 2.0406 - dense_23_loss: 2.2999 - dense_21_mse: 1.2898 - dense_22_mse: 2.0406 - dense_23_mse: 2.2999
Epoch 3/3
1/1 [==============================] - 0s 8ms/step - loss: 4.4403 - dense_21_loss: 0.8691 - dense_22_loss: 1.7483 - dense_23_loss: 1.8228 - dense_21_mse: 0.8691 - dense_22_mse: 1.7483 - dense_23_mse: 1.8228
<tensorflow.python.keras.callbacks.History at 0x7fda43d41b90>
​

So I added one output to the target and tested the same model with a y of shape (4, 4) and the fit works …. I’m lost.

Question: How should I shape my y to fit the model and what actually happened when I gave it the wrong y shape?

Code on Colab

Answer

Both are correct. Take a look at this and this. As you can see that this says ‘Squeeze or expand last dimension if needed’ and so after doing that if the dimensions match then it’s all good.

First of all remember that everything depends on your loss function. Below I will show one example:

# I get a preds from the model at output x.
preds = m(x)

# Let's print the shapes
preds = np.array(preds)
x.shape, y.shape, y2.shape, preds.shape
# Result -> (TensorShape([4, None, 2]), (4, 3), (4, 5), (3, 4, 1))

# Let's take an individual look at the y2[0] and preds[0]
y2[0], preds[0]
'''
(array([1, 1, 1, 1, 1]), 
array([[-0.1815457 ],
        [-1.0390669 ],
        [ 0.27160883],
        [-0.3232715 ]], dtype=float32))


So, now the thing to notice here is what will happen if we do y2[0] - preds[0]?

As the shapes are different the arrays will first be broadcasted and the y2[0] will become :
[[1,1,1,1,1]
[1,1,1,1,1]
[1,1,1,1,1]
[1,1,1,1,1]]

and preds[0] will become:
array([[-0.1815457 , -0.1815457 , -0.1815457 , -0.1815457 , -0.1815457 ],
       [-1.03906691, -1.03906691, -1.03906691, -1.03906691, -1.03906691],
       [ 0.27160883,  0.27160883,  0.27160883,  0.27160883,  0.27160883],
       [-0.32327151, -0.32327151, -0.32327151, -0.32327151, -0.32327151]])
'''
# Doing y2[0] - preds[0]
y2[0] - preds[0]
'''
due to above mentioned broadcasting the results of this will be
array([[1.1815457 , 1.1815457 , 1.1815457 , 1.1815457 , 1.1815457 ],
       [2.03906691, 2.03906691, 2.03906691, 2.03906691, 2.03906691],
       [0.72839117, 0.72839117, 0.72839117, 0.72839117, 0.72839117],
       [1.32327151, 1.32327151, 1.32327151, 1.32327151, 1.32327151]])
'''
# Now we take the mean
np.mean(y2[0] - preds[0])
# Result -> 1.3180688247084618

# After doing the whole process with the whole y2 and preds
temp = y2 - preds
np.mean(temp)
# result -> 1.9192037958030899

# So that was the case of y2. now let's see the case of y1
# speeding things up, if I were to do y[0] - preds[0]
y[0]-preds[0]
'''
The results will be:
array([[1.1815457 , 1.1815457 , 1.1815457 ],
       [2.03906691, 2.03906691, 2.03906691],
       [0.72839117, 0.72839117, 0.72839117],
       [1.32327151, 1.32327151, 1.32327151]])
Can you see the answer? Well now as soon as take the mean the results will be equal to y2.
'''
np.mean(y[0] - preds[0])
# Results  -> 1.3180688247084618

JavaScript
 
# I get a preds from the model at output x.
preds = m(x)
​
# Let's print the shapes
preds = np.array(preds)
x.shape, y.shape, y2.shape, preds.shape
# Result -> (TensorShape([4, None, 2]), (4, 3), (4, 5), (3, 4, 1))
​
# Let's take an individual look at the y2[0] and preds[0]
y2[0], preds[0]
'''
(array([1, 1, 1, 1, 1]), 
array([[-0.1815457 ],
        [-1.0390669 ],
        [ 0.27160883],
        [-0.3232715 ]], dtype=float32))
​
​
So, now the thing to notice here is what will happen if we do y2[0] - preds[0]?
​
As the shapes are different the arrays will first be broadcasted and the y2[0] will become :
[[1,1,1,1,1]
[1,1,1,1,1]
[1,1,1,1,1]
[1,1,1,1,1]]
​
and preds[0] will become:
array([[-0.1815457 , -0.1815457 , -0.1815457 , -0.1815457 , -0.1815457 ],
       [-1.03906691, -1.03906691, -1.03906691, -1.03906691, -1.03906691],
       [ 0.27160883,  0.27160883,  0.27160883,  0.27160883,  0.27160883],
       [-0.32327151, -0.32327151, -0.32327151, -0.32327151, -0.32327151]])
'''
# Doing y2[0] - preds[0]
y2[0] - preds[0]
'''
due to above mentioned broadcasting the results of this will be
array([[1.1815457 , 1.1815457 , 1.1815457 , 1.1815457 , 1.1815457 ],
       [2.03906691, 2.03906691, 2.03906691, 2.03906691, 2.03906691],
       [0.72839117, 0.72839117, 0.72839117, 0.72839117, 0.72839117],
       [1.32327151, 1.32327151, 1.32327151, 1.32327151, 1.32327151]])
'''
# Now we take the mean
np.mean(y2[0] - preds[0])
# Result -> 1.3180688247084618
​
# After doing the whole process with the whole y2 and preds
temp = y2 - preds
np.mean(temp)
# result -> 1.9192037958030899
​
# So that was the case of y2. now let's see the case of y1
# speeding things up, if I were to do y[0] - preds[0]
y[0]-preds[0]
'''
The results will be:
array([[1.1815457 , 1.1815457 , 1.1815457 ],
       [2.03906691, 2.03906691, 2.03906691],
       [0.72839117, 0.72839117, 0.72839117],
       [1.32327151, 1.32327151, 1.32327151]])
Can you see the answer? Well now as soon as take the mean the results will be equal to y2.
'''
np.mean(y[0] - preds[0])
# Results  -> 1.3180688247084618
​

And hence both are working fine in this case.

Advertisement

Answer