I would like to add a custom metric to model with Keras, I’m debugging my working code and I don’t find a method to do the operations I need.
The problem could be described as a multi classification trough logistic multinomial regression. The custom metric I would like to implement is this:
(1/Number_of_Classes)*(TruePositivesClass1/TotalElementsClass1 + TruePositivesClass2/TotalElementsClass2 + ... + TruePositivesClassN/TotalElementsClassN)
Where Number_of_Classes must be calculate from batch, i.e something like np.unique(y_true).count()
and
and every summation item would be something like
len(np.where(y_true==class_i,1,0) == np.where(y_pred==class_i,1,0) )/np.where(y_true==class_i,1,0).sum()
In terms of confusion matrix (in the minimal form of 2 variables)
True False True 15 3 False 12 1
The formula would be 0.5*(15)/(15+12) + 0.5*(1/(1+3))=0.4027
The code could be something like
def custom_metric(y_true,y_pred): total_classes = Unique(y_true) #How calculate total unique elements? summation = 0 for _ in unique_value_on_target: # calculates Number of y_predict that are _ true_predics_of_class = Count(y_predict,_) # calculates total number of items of class _ in batch y_true true_values = Count(y_true,_) value = true_predicts/true_values summation + = value return summation
My preprocessed data is a numpy array like x=[v1,v2,v3,v4,...,vn]
, and my
objetive column is a nompy array y=[1, 0, 1, 0, 1, 0, 0, 1 ,..., 0, 1]
then, they are converted to tensors:
x_train = tf.convert_to_tensor(x) y_train = tf.convert_to_tensor(tf.keras.utils.to_categorical(y))
Then, they are converted to tensorflow dataset objects:
train_ds = tf.data.Dataset.zip((tf.data.Dataset.from_tensor_slices(x_train), tf.data.Dataset.from_tensor_slices(y_train)))
Later, I take a iterator:
train_itr = iter( train_ds.shuffle(len(y_train) * 5, reshuffle_each_iteration=True).batch(len(y_train)))
and last, I take one element of iterator and train
x_train, y_train = train_itr.get_next() model.fit(x=x_train, y=y_train, batch_size=batch_size, epochs=epochs, callbacks=[custom_callback], validation_data=test_itr.get_next())
So, since objects are dataset iterators, I can’t find functions to operate them as I would like, in order to get the custom metric described.
Advertisement
Answer
So you want calculate average recall wrt multiclass in the batch, here is my example code using numpy
and tensorflow
:
import tensorflow as tf import numpy as np y_t = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1], [0, 0, 0, 1]], dtype=np.float32) y_p = np.array([[1, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1], [0, 0, 0, 1]], dtype=np.float32) def average_recall(y_true, y_pred): # Get indexes of both labels and predictions labels = np.argmax(y_true, axis=1) predictions = np.argmax(y_pred, axis=1) # Get confusion matrix from labels and predictions confusion_matrix = tf.math.confusion_matrix(labels, predictions).numpy() # Get number of all true positives in each class all_true_positives = np.diag(confusion_matrix) # Get number of all elements in each class all_class_sum = np.sum(confusion_matrix, axis=1) # Get rid of classes that don't show in batch zero_index = np.where(all_class_sum == 0)[0] all_true_positives = np.delete(all_true_positives, zero_index) all_class_sum = np.delete(all_class_sum, zero_index) print("confusion_matrix:n {},n all_true_positives:n {},n all_class_sum:n {}".format( confusion_matrix, all_true_positives, all_class_sum)) # Average TruePositives / TotalElements wrt all classes that show in batch return np.mean(all_true_positives / all_class_sum) avg_recall = average_recall(y_t, y_p) print(avg_recall)
Outputs:
confusion_matrix: [[1 0 0 0] [1 1 0 0] [0 0 0 0] [0 0 0 2]], all_true_positives: [1 1 2], all_class_sum: [1 2 2] 0.8333333333333334
Implement using only tensorflow:
import tensorflow as tf y_t = tf.constant([[1, 0, 0, 0], [0, 1, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1], [0, 0, 0, 1]], dtype=tf.float32) y_p = tf.constant([[1, 0, 0, 0], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1], [0, 0, 0, 1]], dtype=tf.float32) def average_recall(y_true, y_pred): # Get indexes of both labels and predictions labels = tf.argmax(y_true, axis=1) predictions = tf.argmax(y_pred, axis=1) # Get confusion matrix from labels and predictions confusion_matrix = tf.math.confusion_matrix(labels, predictions) # Get number of all true positives in each class all_true_positives = tf.linalg.diag_part(confusion_matrix) # Get number of all elements in each class all_class_sum = tf.reduce_sum(confusion_matrix, axis=1) # Get rid of classes that don't show in batch mask = tf.not_equal(all_class_sum, tf.constant(0)) all_true_positives = tf.boolean_mask(all_true_positives, mask) all_class_sum = tf.boolean_mask(all_class_sum, mask) print("confusion_matrix:n {},n all_true_positives:n {},n all_class_sum:n {}".format( confusion_matrix, all_true_positives, all_class_sum)) # Average TruePositives / TotalElements wrt all classes that show in batch return tf.reduce_mean(all_true_positives / all_class_sum) avg_recall = average_recall(y_t, y_p) print(avg_recall)
Outputs:
confusion_matrix: [[1 0 0 0] [1 1 0 0] [0 0 0 0] [0 0 0 2]], all_true_positives: [1 1 2], all_class_sum: [1 2 2] tf.Tensor(0.8333333333333334, shape=(), dtype=float64)
Reference:
Calculate precision and recall for multiclass classification using confusion matrix