Regarding the accuracy of the Siamese CNN

# We have 2 inputs, 1 for each picture
left_input = Input(img_size)
right_input = Input(img_size)

# We will use 2 instances of 1 network for this task
convnet = MobileNetV2(weights='imagenet', include_top=False, input_shape=img_size,input_tensor=None)
convnet.trainable=True
x=convnet.output
x=tf.keras.layers.GlobalAveragePooling2D()(x)
x=Dense(320,activation='relu')(x)
x=Dropout(0.2)(x)
preds = Dense(101, activation='sigmoid')(x) # Apply sigmoid
convnet = Model(inputs=convnet.input, outputs=preds)

# Connect each 'leg' of the network to each input
# Remember, they have the same weights
encoded_l = convnet(left_input)
encoded_r = convnet(right_input)


# Getting the L1 Distance between the 2 encodings
L1_layer = Lambda(lambda tensor:K.abs(tensor[0] - tensor[1]))

# Add the distance function to the network
L1_distance = L1_layer([encoded_l, encoded_r])

prediction = Dense(1,activation='sigmoid')(L1_distance)
siamese_net = Model(inputs=[left_input,right_input],outputs=prediction)

optimizer = Adam(lr, decay=2.5e-4)
#//TODO: get layerwise learning rates and momentum annealing scheme described in paperworking
siamese_net.compile(loss=keras.losses.binary_crossentropy,optimizer=optimizer,metrics=['accuracy'])

siamese_net.summary()

JavaScript
 
# We have 2 inputs, 1 for each picture
left_input = Input(img_size)
right_input = Input(img_size)
​
# We will use 2 instances of 1 network for this task
convnet = MobileNetV2(weights='imagenet', include_top=False, input_shape=img_size,input_tensor=None)
convnet.trainable=True
x=convnet.output
x=tf.keras.layers.GlobalAveragePooling2D()(x)
x=Dense(320,activation='relu')(x)
x=Dropout(0.2)(x)
preds = Dense(101, activation='sigmoid')(x) # Apply sigmoid
convnet = Model(inputs=convnet.input, outputs=preds)
​
# Connect each 'leg' of the network to each input
# Remember, they have the same weights
encoded_l = convnet(left_input)
encoded_r = convnet(right_input)
​
​
# Getting the L1 Distance between the 2 encodings
L1_layer = Lambda(lambda tensor:K.abs(tensor[0] - tensor[1]))
​
# Add the distance function to the network
L1_distance = L1_layer([encoded_l, encoded_r])
​
prediction = Dense(1,activation='sigmoid')(L1_distance)
siamese_net = Model(inputs=[left_input,right_input],outputs=prediction)
​
optimizer = Adam(lr, decay=2.5e-4)
#//TODO: get layerwise learning rates and momentum annealing scheme described in paperworking
siamese_net.compile(loss=keras.losses.binary_crossentropy,optimizer=optimizer,metrics=['accuracy'])
​
siamese_net.summary()
​

and the result of training is as follows

Epoch 1/10
126/126 [==============================] - 169s 1s/step - loss: 0.5683 - accuracy: 0.6840 - val_loss: 0.4644 - val_accuracy: 0.8044
Epoch 2/10
126/126 [==============================] - 163s 1s/step - loss: 0.2032 - accuracy: 0.9795 - val_loss: 0.2117 - val_accuracy: 0.9681
Epoch 3/10
126/126 [==============================] - 163s 1s/step - loss: 0.1110 - accuracy: 0.9925 - val_loss: 0.1448 - val_accuracy: 0.9840
Epoch 4/10
126/126 [==============================] - 164s 1s/step - loss: 0.0844 - accuracy: 0.9950 - val_loss: 0.1384 - val_accuracy: 0.9820
Epoch 5/10
126/126 [==============================] - 163s 1s/step - loss: 0.0634 - accuracy: 0.9990 - val_loss: 0.0829 - val_accuracy: 1.0000
Epoch 6/10
126/126 [==============================] - 165s 1s/step - loss: 0.0526 - accuracy: 0.9995 - val_loss: 0.0729 - val_accuracy: 1.0000
Epoch 7/10
126/126 [==============================] - 164s 1s/step - loss: 0.0465 - accuracy: 0.9995 - val_loss: 0.0641 - val_accuracy: 1.0000
Epoch 8/10
126/126 [==============================] - 163s 1s/step - loss: 0.0463 - accuracy: 0.9985 - val_loss: 0.0595 - val_accuracy: 1.0000

JavaScript
 
Epoch 1/10
126/126 [==============================] - 169s 1s/step - loss: 0.5683 - accuracy: 0.6840 - val_loss: 0.4644 - val_accuracy: 0.8044
Epoch 2/10
126/126 [==============================] - 163s 1s/step - loss: 0.2032 - accuracy: 0.9795 - val_loss: 0.2117 - val_accuracy: 0.9681
Epoch 3/10
126/126 [==============================] - 163s 1s/step - loss: 0.1110 - accuracy: 0.9925 - val_loss: 0.1448 - val_accuracy: 0.9840
Epoch 4/10
126/126 [==============================] - 164s 1s/step - loss: 0.0844 - accuracy: 0.9950 - val_loss: 0.1384 - val_accuracy: 0.9820
Epoch 5/10
126/126 [==============================] - 163s 1s/step - loss: 0.0634 - accuracy: 0.9990 - val_loss: 0.0829 - val_accuracy: 1.0000
Epoch 6/10
126/126 [==============================] - 165s 1s/step - loss: 0.0526 - accuracy: 0.9995 - val_loss: 0.0729 - val_accuracy: 1.0000
Epoch 7/10
126/126 [==============================] - 164s 1s/step - loss: 0.0465 - accuracy: 0.9995 - val_loss: 0.0641 - val_accuracy: 1.0000
Epoch 8/10
126/126 [==============================] - 163s 1s/step - loss: 0.0463 - accuracy: 0.9985 - val_loss: 0.0595 - val_accuracy: 1.0000
​

The model is predicting with good accuracy, when i am comparing two dissimilar images. Further it is predicting really good with same class of images. But when I am comparing Image1 with image1 itself, it is predicting that they are similar only with the probability of 0.5. in other case if I compare image1 with image2, then it is predicting correctly with a probability of 0.8.(here image1 and image2 belongs to same class)

when I am comparing individual images, it is predicting correctly, I have tried different alternatives did not workout. May I know what might be the error?

Answer

The L1 distance between two equal vectors is always zero.

When you pass the same image, the encodings generated are equal (encoded_l is equal to encoded_r). Hence, the input to your final sigmoid layer is a zero vector.

And, sigmoid(0) = 0.5.

This is the reason providing identical inputs to your model gives 0.5 as the output.

Advertisement

Answer