I built a simple CNN model and it raised below errors:
Epoch 1/10 235/235 [==============================] - ETA: 0s - loss: 540.2643 - accuracy: 0.4358 --------------------------------------------------------------------------- InvalidArgumentError Traceback (most recent call last) <ipython-input-14-ab88232c98aa> in <module>() 15 train_ds, 16 validation_data=val_ds, ---> 17 epochs=epochs 18 ) 7 frames /usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name) 58 ctx.ensure_initialized() 59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, ---> 60 inputs, attrs, num_outputs) 61 except core._NotOkStatusException as e: 62 if name is not None: InvalidArgumentError: Unknown image file format. One of JPEG, PNG, GIF, BMP required. [[{{node decode_image/DecodeImage}}]] [[IteratorGetNext]] [Op:__inference_test_function_2924] Function call stack: test_function
The code I wrote is quite simple and standard. Most of them are just directly copied from the official website. It raised this error before the first epoch finish. I am pretty sure that the images are all png files. The train folder does not contain anything like text, code, except imgages. I am using Colab. The version of tensorlfow
is 2.5.0. Appreciate for any help.
data_dir = './train' train_ds = tf.keras.preprocessing.image_dataset_from_directory( data_dir, subset='training', validation_split=0.2, batch_size=batch_size, seed=42 ) val_ds = tf.keras.preprocessing.image_dataset_from_directory( data_dir, subset='validation', validation_split=0.2, batch_size=batch_size, seed=42 ) model = Sequential([ layers.InputLayer(input_shape=(image_size, image_size, 3)), layers.Conv2D(32, 3, activation='relu'), layers.MaxPooling2D(), layers.Flatten(), layers.Dense(128, activation='relu'), layers.Dense(num_classes) ]) optimizer = tf.keras.optimizers.Adam(learning_rate=0.001) model.compile( optimizer=optimizer, loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy']) history = model.fit( train_ds, validation_data=val_ds, epochs=epochs )
Advertisement
Answer
Some of your files in the validation folder are not in the format accepted by Tensorflow ( JPEG, PNG, GIF, BMP
), or may be corrupted. The extension of a file is indicative only, and does not enforce anything on the content of the file.
You might be able to find the culprit using the imghdr
module from the python standard library, and a simple loop.
from pathlib import Path import imghdr data_dir = "/home/user/datasets/samples/" image_extensions = [".png", ".jpg"] # add there all your images file extensions img_type_accepted_by_tf = ["bmp", "gif", "jpeg", "png"] for filepath in Path(data_dir).rglob("*"): if filepath.suffix.lower() in image_extensions: img_type = imghdr.what(filepath) if img_type is None: print(f"{filepath} is not an image") elif img_type not in img_type_accepted_by_tf: print(f"{filepath} is a {img_type}, not accepted by TensorFlow")
This should print out whether you have files that are not images, or that are not what their extension says they are, and not accepted by TF. Then you can either get rid of them or convert them to a format that TensorFlow supports.