Skip to content
Advertisement

Hashing the content of a method in Python

I am looking for a robust way to hash/serialize the content of a method in Python.

Use-case: We are doing some file caching the result of a transformation function, and it would be great if it was possible to automatically refresh if transformation function has changed:

cached_file = get_cached_filename(
    data_version=data.version,
    transform_version=1,  # increment this when making updates
)

# Return the cached file if present
if cached_file.exists() and not overwrite:
    logger.debug("Reading dataset from local cache")
    return joblib.load(cached_file)
    
logger.debug("Downloading dataset and storing to local cache")

# Do the expensive data download and transform
df = download_and_apply_transforms(data)

# Store dataframe to local data cache
joblib.dump(df, cached_file)

return df

I am looking for something that could potentially replace my hardcoding of a version in transform_version=1, but rather read a hash from the method itself.

I tried using the built-in hash method of a callable. I.e. download_and_apply_transforms.__hash__(), but it seems to update on each re-run, so it likely bases itself on memory location somehow.

Advertisement

Answer

Good question. Not sure if it is the best approach, but you can get the function source code using inspect module, like this:

inspect.getsource(foo)

That will return the source as a string, so you can then get the hash to get a cache key by running some hashing function.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement