Skip to content
Advertisement

Tensorflow: Issues with determining batch size in custom loss function during model fitting (batch size of “None”)

I’m trying to create a custom loss function, in which I have to slice the tensors multiple times. One example is listed below:

# Since different nodes need different activations, I decided to just do it like this
def activations(y_true, y_pred):
    n = y_true.shape[1]
    means = tf.slice(y_pred, begin=[0,0], size=[y_pred.shape[0], n])
    stdevs = tf.slice(y_pred, begin=[0,n], size=[y_pred.shape[0], n])
    corrs = tf.slice(y_pred, begin=[0,2*n], size=[y_pred.shape[0], y_pred.shape[1]-2*n])
    stdevs = keras.activations.softplus(stdevs)
    corrs = keras.activations.tanh(corrs)

This (and the entire loss function) works fine when testing it manually on selfmade Tensors y_true and y_pred, but when using it inside a loss function it will give an error upon model fitting (compiling goes fine).

    File <filename>, line 105, in activations  *
        means = tf.slice(y_pred, begin=[0,0], size=[y_true.shape[0], n])

    TypeError: Expected int32 passed to parameter 'size' of op 'Slice', got [None, 3] of type 'list' instead. Error: Expected int32, but got None of type 'NoneType'.

So apparently, it can’t determine the batch size when executed inside a loss layer.

How do I solve this?

(note: I’m not looking for a solution to this specific code only, since I’m slicing my tensors quite a lot. I’m looking for a general solution to slicing).

I tried to look at this and this and I read through this post. Is writing a custom generator to make the batch size static really the only way to do this?

Thanks in advance?

EDIT: Here’s a (hugely) simplified version of the code, that triggers the error.

import numpy as np
import numpy.random as npr

import keras
from keras import layers

import tensorflow as tf

# Since different nodes need different activations, I decided to just do it like this
def dummy_loss_func(y_true, y_pred):
    n = y_true.shape[1]
    means = tf.slice(y_pred, begin=[0,0], size=[y_pred.shape[0], n])
    stdevs = tf.slice(y_pred, begin=[0,n], size=[y_pred.shape[0], n]) #I'm assuming these are all (0, infty)
    corrs = tf.slice(y_pred, begin=[0,2*n], size=[y_pred.shape[0], y_pred.shape[1]-2*n])
    
    stdevs = keras.activations.softplus(stdevs)
    corrs = keras.activations.tanh(corrs)
    
    relErrors = tf.math.square(means - y_true)/stdevs
    return tf.reduce_mean(tf.math.square(relErrors))

def dummy_model(dim):
    model = keras.Sequential(
    [
        keras.Input(shape=(1)),
        layers.Dense(2*dim + int(round(dim * (dim-1)/2)), kernel_initializer = tf.keras.initializers.GlorotUniform()),
    ]
    )
    model.summary()
    model.compile(loss=dummy_loss_func, optimizer="adam")
    return model

#Generating some fake data
n = 5000
dim = 3
pts = npr.uniform(size=[n, 2*dim + int(round(dim * (dim-1)/2))])
dummy_in = np.zeros(n)
print(dummy_in.size)
print(pts.size)

#Comping the model goes fine
model = dummy_model(dim)

# Model exucution will go fine
print(model.predict([0]))

# Just calling the loss function also works
print(dummy_loss_func(tf.constant([[3., 2., 1.],[1., 2., 3.]]), tf.constant([[2., 1., 1., 5., 3., 2., 3., 2., 1.], [2., 5., 1., 1., 3., 6., 3., 4., 1.]])))

# The error only comes here
model.fit(dummy_in, pts, verbose=1)

Advertisement

Answer

let’s work through this together. Likely both of us will need to edit things back and forth.

I’m going to address the slice part of your question, since that was the most tractable given the information.

Let’s instantiate a tensor of shape [3, 3, 3]:

y = tf.constant([ [[1, 2, 3]   , [4, 5, 6   ], [7, 8, 9   ]],                                                                                                          
                  [[10, 11, 12], [13, 14, 15], [16, 17, 18]],                                                                                                 
                  [[19, 20, 21], [22, 23, 24], [25, 26, 27]] ]) 

Notice that this is 1 tensor of shape [3, 3, 3]. Let’s visualize it:

[ins] In [50]: y[0]                                                                                                                                                         
Out[50]:                                                                                                                                                                    
<tf.Tensor: shape=(3, 3), dtype=int32, numpy=                                                                                                                               
array([[1, 2, 3],                                                                                                                                                           
       [4, 5, 6],                                                                                                                                                           
       [7, 8, 9]], dtype=int32)>                                                                                                                                            
                                                                                                                                                                            
[ins] In [51]: y[1]                                                                                                                                                         
Out[51]:                                                                                                                                                                    
<tf.Tensor: shape=(3, 3), dtype=int32, numpy=                                                                                                                               
array([[10, 11, 12],                                                                                                                                                        
       [13, 14, 15],                                                                                                                                                        
       [16, 17, 18]], dtype=int32)>                                                                                                                                         
                                                                                                                                                                            
[ins] In [52]: y[2]                                                                                                                                                         
Out[52]:                                                                                                                                                                    
<tf.Tensor: shape=(3, 3), dtype=int32, numpy=                                                                                                                               
array([[19, 20, 21],                                                                                                                                                        
       [22, 23, 24],                                                                                                                                                        
       [25, 26, 27]], dtype=int32)>                       

In terms of axes, we can imagine the left-most axis containing 3 3×3 matrices, which we referenced above using y[0], y[1], and y[2]. Now let’s carve this cube of numbers.

[nav] In [53]: tf.slice(y, begin=[0, 0, 0], size=[2, 2, 2])                                                                                                                 
Out[53]:                                                                                                                                                                    
<tf.Tensor: shape=(2, 2, 2), dtype=int32, numpy=                                                                                                                            
array([[[ 1,  2],                                                                                                                                                           
        [ 4,  5]],                                                                                                                                                          
                                                                                                                                                                            
       [[10, 11],                                                                                                                                                           
        [13, 14]]], dtype=int32)>                                                                                                                                           
                                            

What’s happening here is we’re asking for a smaller cube from the bigger cube, specifically of shape [2, 2, 2] and we want it to start from the point [0, 0, 0]. So we are going to make three cuts to that bigger cube: first we’re going to go into the “computer” axis two steps, so nothing from the deepest layer there should show up (numbers [19, 20, 21],[22, 23, 24],[25, 26, 27] in shape [3, 3]). Then we are going to make a horizontal cut, which means none of the numbers from [7, 8, 9],[16, 17, 18] show up, [25, 26, 27] was already chopped off in the last cut. Lastly, we make a vertical cut 2 steps from the origin, ensuring [3, 6],[12,15] don’t show up. So we lose nine numbers in the first chop, we would’ve lost nine in the second, but three overlapped with the first chop, so we only lost six. The third chop, we would’ve lost nine, but we lost three from the first chop, two from the second chop (would’ve been three, but one overlaps with the first), which leaves four that were lost in the last chop. 27 - (9 + 6 + 4) = 8 which is what we got.

One of the key things to work on is to ask the question: do I have a batch here, or is it one observation that’s in the batch that I’m handling. How can you tell? The left-most axis is the batch axis, and it’s generally represented as None, that means there’s a variable number of batches. Let’s make a batch of the tensor we have, which you can do with the above tensor as following:

[ins] In [57]: tf.reshape(y, shape=(-1, 3, 3, 3))                                                                                                                           
Out[57]:                                                                                                                                                                    
<tf.Tensor: shape=(1, 3, 3, 3), dtype=int32, numpy=                                                                                                                         
array([[[[ 1,  2,  3],                                                                                                                                                      
         [ 4,  5,  6],                                                                                                                                                      
         [ 7,  8,  9]],                                                                                                                                                     
                                                                                                                                                                            
        [[10, 11, 12],                                                                                                                                                      
         [13, 14, 15],                                                                                                                                                      
         [16, 17, 18]],                                                                                                                                                     
                                                                                                                                                                            
        [[19, 20, 21],                                                                                                                                                      
         [22, 23, 24],                                                                                                                                                      
         [25, 26, 27]]]], dtype=int32)>                                                                                                                                     
                                                                                                                                                                            
[ins] In [58]: tf.reshape(y, shape=(-1, 3, 3, 3)).shape                                                                                                                     
Out[58]: TensorShape([1, 3, 3, 3])                 

What the above is saying is that reshape my data so that I have a 3x3x3 cube, but I also want something in the left-most, aka batch, axis. Since there’s 27 numbers, it just “deepens” the dimensions. This can be seen by the addition of another pair of [ ]s in the output above. It can’t manufacture numbers for us after all as these are our observations. You can also use tf.expand_dims but I find tf.reshape to be more intuitive.

Now we have a batch of size 1, where each observation is a cube of shape [3, 3, 3] which can be assigned to y_pred if you like. Try and run the batch through your functions and see how it works. Another thing that I have found super helpful with dealing with issues of shape is using ipdb and embed mode in ipython. You can set breakpoints and get into the offending lines and observe and fix. Best of luck!

Solution (w/o any fundamental domain knowledge. Apparently tensors are domain agnostic :) )

pts_tensor = tf.constant(pts)                                                                                                                                               
dummy_in_tensor = tf.constant(tf.reshape(dummy_in, (-1,1)))                                                                                                                 
my_ds = tf.data.Dataset.from_tensor_slices((dummy_in_tensor, pts_tensor))                                                                                                   
model.fit(my_ds, verbose=1) 

I think the issue was with the batch axis. To do any better, I’d need to understand the domain, but I got some studying to do :)

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement