I am running a Keras model on the Breast Cancer dataset. I got around 96% accuracy with it, but the confusion matrix is completely off. Here are the graphs:
And here is my confusion matrix:
The matrix is saying that I have no true negatives and they’re actually false negatives, when I believe that it’s the reverse. Another thing that I noticed is that when the amount of true values are added up and divided by the length of the testing set, the result does not reflect the score that is calculated from the model. Here is the whole code:
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from tensorflow import keras from tensorflow.math import confusion_matrix from keras import Sequential from keras.layers import Dense breast = load_breast_cancer() X = breast.data y = breast.target #Split data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) #Scale data sc = StandardScaler() X_train = sc.fit_transform(X_train) X_test = sc.fit_transform(X_test) #Create and fit keras model model = Sequential() model.add(Dense(8, activation='relu', input_shape=[X.shape[1]])) model.add(Dense(4, activation='relu')) model.add(Dense(1, activation='sigmoid')) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) history = model.fit(X_train, y_train, validation_data=(X_test, y_test), batch_size=16, epochs=50, verbose=1) history = pd.DataFrame(history.history) #Display loss visualization history.loc[:,['loss','val_loss']].plot(); history.loc[:,['accuracy','val_accuracy']].plot(); #Create confusion matrix y_pred = model.predict(X_test) conf_matrix = confusion_matrix(y_test,y_pred) cm = sns.heatmap(conf_matrix, annot=True, cmap='gray', annot_kws={'size':30}) cm_labels = ['Positive','Negative'] cm.set_xlabel('True') cm.set_xticklabels(cm_labels) cm.set_ylabel('Predicted') cm.set_yticklabels(cm_labels);
Am I doing something wrong here? Am I missing something?
Advertisement
Answer
Check the confusion matrix values from the sklearn.metrics.confusion_matrix
official documentation. The values are so organized:
TN
: upper left cornerFP
: upper right cornerFN
: lower left cornerTP
: lower right corner
You’re getting 53 true negatives and 90 false negatives from the current confusion matrix.