After successfully creating a tensorflow image Dataset
with:
dataset = tf.keras.utils.image_dataset_from_directory(...)
which returns
Found 21397 files belonging to 5 classes. Using 17118 files for training.
There is the cardinality method:
dataset.cardinality()
which returns a tensor containing the single value
tf.Tensor(535, shape=(), dtype=int64)
I’ve read the docs here but I don’t understand what 535 represents or why its different to the number of files?
I ask, because I would like to understand how cardinality plays into this equation:
steps_per_epoch = dataset.cardinality().numpy() // batch_size
Advertisement
Answer
The cardinality, in your case, is simply the rounded number of batches:
import tensorflow as tf import pathlib dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz" data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True) data_dir = pathlib.Path(data_dir) batch_size = 32 train_ds = tf.keras.utils.image_dataset_from_directory( data_dir, validation_split=0.2, subset="training", seed=123, image_size=(180, 180), batch_size=batch_size) print(train_ds.cardinality())
Found 3670 files belonging to 5 classes. Using 2936 files for training. tf.Tensor(92, shape=(), dtype=int64)
The equation is: 2936/32 = cardinality
, so it depends on your batch size.