I’m loading images via
data = keras.preprocessing.image_dataset_from_directory( './data', labels='inferred', label_mode='binary', validation_split=0.2, subset="training", image_size=(img_height, img_width), batch_size=sz_batch, crop_to_aspect_ratio=True )
I want to use the obtained data in non-tensorflow routines too. Therefore, I want to extract the data e.g. to numpy arrays. How can I achieve this? I can’t use tfds
Advertisement
Answer
I would suggest unbatching your dataset and using tf.data.Dataset.map
:
import numpy as np import tensorflow as tf dataset_url = "https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz" data_dir = tf.keras.utils.get_file('flower_photos', origin=dataset_url, untar=True) data_dir = pathlib.Path(data_dir) batch_size = 32 train_ds = tf.keras.utils.image_dataset_from_directory( data_dir, validation_split=0.2, subset="training", seed=123, image_size=(180, 180), batch_size=batch_size, shuffle=False) train_ds = train_ds.unbatch() images = np.asarray(list(train_ds.map(lambda x, y: x))) labels = np.asarray(list(train_ds.map(lambda x, y: y)))
Or as suggested in the comments, you could also try just working with the batches and concatenating them afterwards:
images = np.concatenate(list(train_ds.map(lambda x, y: x))) labels = np.concatenate(list(train_ds.map(lambda x, y: y)))
Or set shuffle=True
and use tf.TensorArray
:
images = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True) labels = tf.TensorArray(dtype=tf.int32, size=0, dynamic_size=True) for x, y in train_ds.unbatch(): images = images.write(images.size(), x) labels = labels.write(labels.size(), y) images = tf.stack(images.stack(), axis=0) labels = tf.stack(labels.stack(), axis=0)