Behavior of Dataset.map in Tensorflow

Question

I'm trying to take variable length tensors and split them up into tensors of length 4, discarding any extra elements (if the length is not divisible by four). I've therefore written the following function: This produces the output [<tf.Tensor: shape=(4,), dtype=int32, numpy=array([1, 2, 3, 4], dtype=int32)>], as expected. If I now run the same function using Dataset.map: I instead get

Accepted Answer

You should use tf.shape to get the dynamic shape of a tensor in graph mode:token_length = tf.shape(tokens)[0]And another problem you have is using a scalar tensor as the number of splits in graph mode. That won&#8217;t work either.Try this:import tensorflow as tfdef body(i, m, n):  n = n.write(n.size(), m[i:i+chunk_size])  return tf.add(i,chunk_size), m, n def split_data(data, chunk_size):    length = tf.shape(data)[0]    x = data[:(length // chunk_size) * chunk_size]    ta = tf.TensorArray(dtype=tf.int32, size=0, dynamic_size=True)    i0 = tf.constant(0)    c = lambda i, m, n: tf.less(i, tf.shape(x)[0] - 1)    _, _, out = tf.while_loop(c, body, loop_vars=[i0, x, ta])    return out.stack()chunk_size = 4dataset = tf.data.Dataset.from_tensor_slices(    tf.ragged.constant([[1, 2, 3, 4, 5], [4, 5, 6, 7], [1, 2, 3, 4, 5, 6, 7, 8, 9]])).map(lambda x: split_data(x, 4)).flat_map(tf.data.Dataset.from_tensor_slices)for item in dataset:  print(item)tf.Tensor([1 2 3 4], shape=(4,), dtype=int32)tf.Tensor([4 5 6 7], shape=(4,), dtype=int32)tf.Tensor([1 2 3 4], shape=(4,), dtype=int32)tf.Tensor([5 6 7 8], shape=(4,), dtype=int32)And see my other answer here.

Advertisement

Answer