Skip to content
Advertisement

Tensorflow 2.3, Tensorflow dataset, TypeError: () takes 1 positional argument but 4 were given

I use tf.data.TextLineDataset to read 4 large files and I use tf.data.Dataset.zip to zip these 4 files and create “dataset”. However, I can not pass “dataset” to dataset.map to use tf.compat.v1.string_split and split with t separator and finally use batch, prefetch and finally feed into my model.

This is my code:

d1 = tf.data.TextLineDataset("File1.raw")
d2 = tf.data.TextLineDataset("File2.raw")
d3 = tf.data.TextLineDataset("File3.raw")
d4 = tf.data.TextLineDataset("File4.raw")
dataset = tf.data.Dataset.zip((d1,d2,d3,d4))
dataset = dataset.map(lambda string: tf.compat.v1.string_split([string],sep='t').values)

This is error message:

packages/tensorflow/python/autograph/impl/api.py", line 339, in _call_unconverted
return f(*args, **kwargs)
TypeError: <lambda>() takes 1 positional argument but 4 were given

What should I do?

Advertisement

Answer

The tf.data.Dataset.zip function iterates over an arbitrary number of dataset objects at the same time. In other words, if you zip over four datasets, you will get four items at each iteration (one from each dataset). This also explains the error OP received

TypeError: <lambda>() takes 1 positional argument but 4 were given

The function being mapped needs to be able to handle four arguments, because it is being applied to a zip of four datasets. The code below includes a function that takes four arguments (datasets) and splits them by t. You can map this to the zipped dataset. I substituted the tf.data.TextLineDataset objects with sample datasets.

import tensorflow as tf

d1 = tf.data.Dataset.from_tensors(["foot1"])
d2 = tf.data.Dataset.from_tensors(["foot2"])
d3 = tf.data.Dataset.from_tensors(["foot3"])
d4 = tf.data.Dataset.from_tensors(["foot4"])

def split_by_tab(text1, text2, text3, text4):
    sep = "t"
    return (
        tf.strings.split(text1, sep=sep),
        tf.strings.split(text2, sep=sep),
        tf.strings.split(text3, sep=sep),
        tf.strings.split(text4, sep=sep),
    )

dataset = tf.data.Dataset.zip((d1,d2,d3,d4))
dataset = dataset.map(split_by_tab)

As alternative, I can merge these file and create a very large file and then shuffle, batch and prefetch rows from it. Right? Any other solution?

The files could be merged, but if they are large, it’s probably not worth doing. I did not realize that the features were split across multiple files. In this case, zipping is a reasonable thing to do.

There is also a library tensorflow_text that may be relevant to this question. Might be worth checking out.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement