Skip to content
Advertisement

Type hints for a pandas DataFrame with mixed dtypes

I’ve been looking for robust type hints for a pandas DataFrame, but cannot seem to find anything useful. This question barely scratches the surface Pythonic type hints with pandas?

Normally if I want to hint the type of a function, that has a DataFrame as an input argument I would do:

JavaScript

What I cannot seem to find is how do I type hint a DataFrame with mixed dtypes. The DataFrame constructor supports only type definition of the complete DataFrame. So to my knowledge changes in the dtypes can only occur afterwards with the pd.DataFrame().astype(dtypes={}) function.

This here works, but doesn’t seem very pythonic to me

JavaScript

I came across this package: https://pypi.org/project/dataenforce/ with examples such as this one:

JavaScript

This looks somewhat promising, but sadly the project is old and buggy.

As a data scientist, building a machine learning application with long ETL processes I believe that type hints are important.

What do you use and does anybody type hint their dataframes in pandas?

Advertisement

Answer

I have now found the pandera library that seems very promising:

https://github.com/pandera-dev/pandera

It allows users to create schemas and use those schemas to create verbose checks. From their docs:

https://pandera.readthedocs.io/en/stable/schema_models.html

JavaScript

Also a note from them:

Due to current limitations in the pandas library (see discussion here), pandera annotations are only used for run-time validation and cannot be leveraged by static-type checkers like mypy. See the discussion here for more details.

But still, even though there is no static-type checking I think that this is going in a very good direction.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement