I am looking for a robust way to hash/serialize the content of a method in Python.
Use-case: We are doing some file caching the result of a transformation function, and it would be great if it was possible to automatically refresh if transformation function has changed:
cached_file = get_cached_filename( data_version=data.version, transform_version=1, # increment this when making updates ) # Return the cached file if present if cached_file.exists() and not overwrite: logger.debug("Reading dataset from local cache") return joblib.load(cached_file) logger.debug("Downloading dataset and storing to local cache") # Do the expensive data download and transform df = download_and_apply_transforms(data) # Store dataframe to local data cache joblib.dump(df, cached_file) return df
I am looking for something that could potentially replace my hardcoding of a version in transform_version=1
, but rather read a hash from the method itself.
I tried using the built-in hash method of a callable. I.e. download_and_apply_transforms.__hash__()
, but it seems to update on each re-run, so it likely bases itself on memory location somehow.
Advertisement
Answer
Good question. Not sure if it is the best approach, but you can get the function source code using inspect
module, like this:
inspect.getsource(foo)
That will return the source as a string, so you can then get the hash to get a cache key by running some hashing function.