Skip to content
Advertisement

Cheap mapping of string to small fixed-length string

Just for debugging purposes I would like to map a big string (a session_id, which is difficult to visualize) to a, let’s say, 6 character “hash”. This hash does not need to be secure in any way, just cheap to compute, and of fixed and reduced length (md5 is too long). The input string can have any length.

How would you implement this “cheap_hash” in python so that it is not expensive to compute? It should generate something like this:

def compute_cheap_hash(txt, length=6):
    # do some computation
    return cheap_hash

print compute_cheap_hash("SDFSGSADSADFSasdfgsadfSDASAFSAGAsaDSFSA2345435adfdasgsaed")
aBxr5u

Advertisement

Answer

I can’t recall if MD5 is uniformly distributed, but it is designed to change a lot even for the smallest difference in the input.

Don’t trust my math, but I guess the collision chance is 1 in 16^6 for the first 6 digits from the MD5 hexdigest, which is about 1 in 17 millions.

So you can just cheap_hash = lambda input: hashlib.md5(input).hexdigest()[:6].

After that you can use hash = cheap_hash(any_input) anywhere.

PS: Any algorithm can be used; MD5 is slightly cheaper to compute but hashlib.sha256 is also a popular choice.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement