Just for debugging purposes I would like to map a big string (a session_id, which is difficult to visualize) to a, let’s say, 6 character “hash”. This hash does not need to be secure in any way, just cheap to compute, and of fixed and reduced length (md5 is too long). The input string can have any length.
How would you implement this “cheap_hash” in python so that it is not expensive to compute? It should generate something like this:
def compute_cheap_hash(txt, length=6):
# do some computation
return cheap_hash
print compute_cheap_hash("SDFSGSADSADFSasdfgsadfSDASAFSAGAsaDSFSA2345435adfdasgsaed")
aBxr5u
Advertisement
Answer
I can’t recall if MD5 is uniformly distributed, but it is designed to change a lot even for the smallest difference in the input.
Don’t trust my math, but I guess the collision chance is 1 in 16^6 for the first 6 digits from the MD5 hexdigest, which is about 1 in 17 millions.
So you can just cheap_hash = lambda input: hashlib.md5(input).hexdigest()[:6]
.
After that you can use hash = cheap_hash(any_input)
anywhere.
PS: Any algorithm can be used; MD5 is slightly cheaper to compute but hashlib.sha256
is also a popular choice.