I have a standard directory structure of train
, validation
, test
, and each contain class subdirectories.
... |train |class A |1 |1_1.raw |1_2.raw ... |2 ... |class B ... |test ...
I want to use the flow_from_directory
API, but all I can find is an ImageDataGenerator
, and the files I have are raw numpy arrays (generated with arr.tofile(...)
).
Is there an easy way to use ImageDataGenerator
with a custom file loader?
I’m aware of flow_from_dataframe
, but that doesn’t seem to accomplish what I want either; it’s for reading images with more custom organization. I want a simple way to load raw binary files instead of having to re-encode 100,000s of files into jpgs with some precision loss along the way (and wasted time, etc.).
Advertisement
Answer
Tensorflow is an entire ecosystem with IO capabilities and ImageDataGenerator
is one of the least flexible approaches. Read here on How to Load Numpy Data in Tensorflow.
import tensorflow as tf import numpy as np DATA_URL = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz' path = tf.keras.utils.get_file('mnist.npz', DATA_URL) with np.load(path) as data: train_examples = data['x_train'] train_labels = data['y_train'] test_examples = data['x_test'] test_labels = data['y_test'] train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels)) test_dataset = tf.data.Dataset.from_tensor_slices((test_examples, test_labels))