Skip to content
Advertisement

Custom data generator

I have a standard directory structure of train, validation, test, and each contain class subdirectories.

...
  |train
      |class A
          |1
              |1_1.raw
              |1_2.raw
              ...
          |2
              ...
      |class B
          ...
  |test
      ...

I want to use the flow_from_directory API, but all I can find is an ImageDataGenerator, and the files I have are raw numpy arrays (generated with arr.tofile(...)).

Is there an easy way to use ImageDataGenerator with a custom file loader?

I’m aware of flow_from_dataframe, but that doesn’t seem to accomplish what I want either; it’s for reading images with more custom organization. I want a simple way to load raw binary files instead of having to re-encode 100,000s of files into jpgs with some precision loss along the way (and wasted time, etc.).

Advertisement

Answer

Tensorflow is an entire ecosystem with IO capabilities and ImageDataGenerator is one of the least flexible approaches. Read here on How to Load Numpy Data in Tensorflow.

import tensorflow as tf
import numpy as np

DATA_URL = 'https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz'

path = tf.keras.utils.get_file('mnist.npz', DATA_URL)
with np.load(path) as data:
  train_examples = data['x_train']
  train_labels = data['y_train']
  test_examples = data['x_test']
  test_labels = data['y_test']

train_dataset = tf.data.Dataset.from_tensor_slices((train_examples, train_labels))
test_dataset = tf.data.Dataset.from_tensor_slices((test_examples, test_labels))
Advertisement