# We have 2 inputs, 1 for each picture left_input = Input(img_size) right_input = Input(img_size) # We will use 2 instances of 1 network for this task convnet = MobileNetV2(weights='imagenet', include_top=False, input_shape=img_size,input_tensor=None) convnet.trainable=True x=convnet.output x=tf.keras.layers.GlobalAveragePooling2D()(x) x=Dense(320,activation='relu')(x) x=Dropout(0.2)(x) preds = Dense(101, activation='sigmoid')(x) # Apply sigmoid convnet = Model(inputs=convnet.input, outputs=preds) # Connect each 'leg' of the network to each input # Remember, they have the same weights encoded_l = convnet(left_input) encoded_r = convnet(right_input) # Getting the L1 Distance between the 2 encodings L1_layer = Lambda(lambda tensor:K.abs(tensor[0] - tensor[1])) # Add the distance function to the network L1_distance = L1_layer([encoded_l, encoded_r]) prediction = Dense(1,activation='sigmoid')(L1_distance) siamese_net = Model(inputs=[left_input,right_input],outputs=prediction) optimizer = Adam(lr, decay=2.5e-4) #//TODO: get layerwise learning rates and momentum annealing scheme described in paperworking siamese_net.compile(loss=keras.losses.binary_crossentropy,optimizer=optimizer,metrics=['accuracy']) siamese_net.summary()
and the result of training is as follows
Epoch 1/10 126/126 [==============================] - 169s 1s/step - loss: 0.5683 - accuracy: 0.6840 - val_loss: 0.4644 - val_accuracy: 0.8044 Epoch 2/10 126/126 [==============================] - 163s 1s/step - loss: 0.2032 - accuracy: 0.9795 - val_loss: 0.2117 - val_accuracy: 0.9681 Epoch 3/10 126/126 [==============================] - 163s 1s/step - loss: 0.1110 - accuracy: 0.9925 - val_loss: 0.1448 - val_accuracy: 0.9840 Epoch 4/10 126/126 [==============================] - 164s 1s/step - loss: 0.0844 - accuracy: 0.9950 - val_loss: 0.1384 - val_accuracy: 0.9820 Epoch 5/10 126/126 [==============================] - 163s 1s/step - loss: 0.0634 - accuracy: 0.9990 - val_loss: 0.0829 - val_accuracy: 1.0000 Epoch 6/10 126/126 [==============================] - 165s 1s/step - loss: 0.0526 - accuracy: 0.9995 - val_loss: 0.0729 - val_accuracy: 1.0000 Epoch 7/10 126/126 [==============================] - 164s 1s/step - loss: 0.0465 - accuracy: 0.9995 - val_loss: 0.0641 - val_accuracy: 1.0000 Epoch 8/10 126/126 [==============================] - 163s 1s/step - loss: 0.0463 - accuracy: 0.9985 - val_loss: 0.0595 - val_accuracy: 1.0000
The model is predicting with good accuracy, when i am comparing two dissimilar images. Further it is predicting really good with same class of images. But when I am comparing Image1 with image1 itself, it is predicting that they are similar only with the probability of 0.5. in other case if I compare image1 with image2, then it is predicting correctly with a probability of 0.8.(here image1 and image2 belongs to same class)
when I am comparing individual images, it is predicting correctly, I have tried different alternatives did not workout. May I know what might be the error?
Advertisement
Answer
The L1 distance between two equal vectors is always zero.
When you pass the same image, the encodings generated are equal (encoded_l
is equal to encoded_r
). Hence, the input to your final sigmoid layer is a zero vector.
And, sigmoid(0) = 0.5
.
This is the reason providing identical inputs to your model gives 0.5
as the output.