Keras model.evaluate accuracy stuck at 50 percent while using ImageDataGenerator

I am trying to find the accuracy of my saved Keras model using model.evaluate.

I have loaded in my model using this:

model = keras.models.load_model("../input/modelpred/2_convPerSection_4_sections")

JavaScript
​x
 
model = keras.models.load_model("../input/modelpred/2_convPerSection_4_sections")
​

I have a CSV file with two columns, one for the filename of an image and one for the label. Here is a sample:

id,label
95d04f434d05c1565abdd1cbf250499920ae8ecf.tif,0
169d0a4a1dbd477f9c1a00cd090eff28ac9ef2c1.tif,0
51cb2710ab9a05569bbdedd838293c37748772db.tif,1
4bbb675f8fde60e7f23b3354ee8df223d952c83c.tif,1
667a242a7a02095f25e0833d83062e8d14a897cd.tif,0

JavaScript
 
id,label
95d04f434d05c1565abdd1cbf250499920ae8ecf.tif,0
169d0a4a1dbd477f9c1a00cd090eff28ac9ef2c1.tif,0
51cb2710ab9a05569bbdedd838293c37748772db.tif,1
4bbb675f8fde60e7f23b3354ee8df223d952c83c.tif,1
667a242a7a02095f25e0833d83062e8d14a897cd.tif,0
​

I have loaded this CSV into a pandas dataframe and fed it into an ImageDataGenerator:

df = pd.read_csv("../input/cancercsv/df_test.csv", dtype=object)

test_path = "../input/histopathologic-cancer-detection/train"

test_data_generator = ImageDataGenerator(rescale=1./255).flow_from_dataframe(dataframe = df,
                                                                                  directory=test_path,
                                                                                  x_col = "id",
                                                                                  y_col = "label",
                                                                                  target_size=(96,96),
                                                                                  batch_size=16,
                                                                                  shuffle=False)

JavaScript
 
df = pd.read_csv("../input/cancercsv/df_test.csv", dtype=object)
​
test_path = "../input/histopathologic-cancer-detection/train"
​
test_data_generator = ImageDataGenerator(rescale=1./255).flow_from_dataframe(dataframe = df,
                                                                                  directory=test_path,
                                                                                  x_col = "id",
                                                                                  y_col = "label",
                                                                                  target_size=(96,96),
                                                                                  batch_size=16,
                                                                                  shuffle=False)
​

Now I try to evaluate my model using:

val = model.evaluate(test_data_generator, verbose = 1)
print(val)

JavaScript
 
val = model.evaluate(test_data_generator, verbose = 1)
print(val)
​

However, the accuracy doesn’t change from 50 percent, but, my model had a 90 percent validation accuracy when trained.

Here is what is returned:

163/625 [======>.......................] - ETA: 21s - loss: 1.1644 - accuracy: 0.5000

JavaScript
 
163/625 [======>.......................] - ETA: 21s - loss: 1.1644 - accuracy: 0.5000
​

I was able to ensure that my model worked and the generator was properly feeding data, by creating an ROC curve using matplotlib and scikit-learn, which produced a 90 percent AUC, so I’m not sure where the problem is:

predictions = model.predict_generator(test_data_generator, steps=len(test_data_generator), verbose = 1)
false_positive_rate, true_positive_rate, threshold = roc_curve(test_data_generator.classes, np.round(predictions))
area_under_curve = auc(false_positive_rate, true_positive_rate)

plt.plot([0, 1], [0, 1], 'k--')
plt.plot(false_positive_rate, true_positive_rate, label='AUC = {:.3f}'.format(area_under_curve))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()

JavaScript
 
predictions = model.predict_generator(test_data_generator, steps=len(test_data_generator), verbose = 1)
false_positive_rate, true_positive_rate, threshold = roc_curve(test_data_generator.classes, np.round(predictions))
area_under_curve = auc(false_positive_rate, true_positive_rate)
​
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(false_positive_rate, true_positive_rate, label='AUC = {:.3f}'.format(area_under_curve))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()
​

Similar questions say that the problem came from setting shuffle parameter in the ImageDataGenerator to True, but mine has always been set to False. Another similar problem was fixed by retraining with a sigmoid activation rather than softmax, but I used sigmoid in my final layer, so that can’t be the problem

This is my first time using Keras. What did I do wrong?

Answer

The problem was because of class_mode parameter in flow function. Default is categorical.

Setting it as binary solved the problem. Corrected code:

test_data_generator = ImageDataGenerator(rescale=1./255).flow_from_dataframe(dataframe = df,
                                                                                  directory=test_path,
                                                                                  x_col = "id",
                                                                                  y_col = "label",
                                                                                 class_mode = 'binary',
                                                                                  target_size=(96,96),
                                                                                  batch_size=16,
                                                                                  shuffle=False)

JavaScript
 
test_data_generator = ImageDataGenerator(rescale=1./255).flow_from_dataframe(dataframe = df,
                                                                                  directory=test_path,
                                                                                  x_col = "id",
                                                                                  y_col = "label",
                                                                                 class_mode = 'binary',
                                                                                  target_size=(96,96),
                                                                                  batch_size=16,
                                                                                  shuffle=False)
​

Advertisement

Answer