Skip to content
Advertisement

Two identical images have a different hash can’t figure out why

I have a directory with a single image of a baseball in it, image is 1.jpg. I use cv2 to read in the image . I then define a path to write the image back into the same directory as 2.jpg. So 1.jpg and 2.jpg are identical. Then for each image I calculate a “difference” hash of length 256 using the function get_hash. I then print out the hash for each image. They are almost identical but differ by at least 1 bit. Can not figure out why. Thought it could it be due to JPG compression when the image was copied so I also ran the code using png format for both images and still got different hash values. Any insight would be appreciated. Code is shown below

def get_hash(fpath, hash_length):
    dim = int(math.sqrt(hash_length)) # with hash_length=256 dim=16   
    r_str=''    
    img=cv2.imread(fpath,0)        # read image as gray scale image
    img=cv2.resize(img, (dim,dim), interpolation = cv2.INTER_NEAREST)
    img=img.flatten()  # now a 256 bit vector  
    list2=list(img) 
    for col in range (0,len(list2)-1):
        if(list2[col]>list2[col+1]):
            value='1'
        else:
            value='0'
        r_str=r_str + value    
    return r_str

def match(value1, value2, distance): 
    # returns True is the number of mismatches in the hashes is less than distance
    # with distance=0 returns True only if hashes are identical
    mismatch_count=0
    for i in range(0,len(value1) ):
        if value1[i] !=value2[i]:
            mismatch_count +=1
    if mismatch_count>distance:
                return False
    else:
        return True

path_to_image=r'C:Tempballsdup31.jpg'
img=cv2.imread(path_to_image)
path_to_write_image=r'C:Tempballsdup32.jpg'
cv2.imwrite(path_to_write_image, img) # write the identical image to directory with file name 2.jpg
hash_length = 256
h1=get_hash(path_to_image, hash_length)
h2=get_hash(path_to_write_image, hash_length)
print (h1)
print (h2)
distance = 0 # both hashes must match identically
m = match(h1, h2, distance)
print (m) # should be true since the images are identical  but returns false
# because there is a single bit difference in the two hashes

256 length hash to long to put here but here is the region in which the two hash values differ by 1 bit (6th bit from the end)

hash for 1.jpg 
00000000000000000000011000000000000010001001000000110000000010000010001
hash for 2.jpg
00000000100000000000011000000000000010001001000000110000000010000110001

Advertisement

Answer

[JPG]

The saved image 2.jpg is different from the original image 1.jpg.

You can compare the images online.

enter image description here

[BMP]

I’ve trid to re-save image as bmp, so they are equal at all, then Their hash values are also equal.

enter image description here

[PNG]

When convert to png, the images are equal, but I found the bit depth are different.

enter image description here

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement