Type hints for a pandas DataFrame with mixed dtypes

Question

I've been looking for robust type hints for a pandas DataFrame, but cannot seem to find anything useful. This question barely scratches the surface Pythonic type hints with pandas? Normally if I want to hint the type of a function, that has a DataFrame as an input argument I would do: What I cannot seem to find is how do

Accepted Answer

I have now found the pandera library that seems very promising:https://github.com/pandera-dev/panderaIt allows users to create schemas and use those schemas to create verbose checks. From their docs:https://pandera.readthedocs.io/en/stable/schema_models.htmlimport pandas as pdimport pandera as pafrom pandera.typing import Index, DataFrame, Seriesclass InputSchema(pa.SchemaModel):    year: Series[int] = pa.Field(gt=2000, coerce=True)    month: Series[int] = pa.Field(ge=1, le=12, coerce=True)    day: Series[int] = pa.Field(ge=0, le=365, coerce=True)class OutputSchema(InputSchema):    revenue: Series[float]@pa.check_typesdef transform(df: DataFrame[InputSchema]) -> DataFrame[OutputSchema]:    return df.assign(revenue=100.0)df = pd.DataFrame({    "year": ["2001", "2002", "2003"],    "month": ["3", "6", "12"],    "day": ["200", "156", "365"],})transform(df)invalid_df = pd.DataFrame({    "year": ["2001", "2002", "1999"],    "month": ["3", "6", "12"],    "day": ["200", "156", "365"],})transform(invalid_df)Also a note from them:Due to current limitations in the pandas library (see discussion here), pandera annotations are only used for run-time validation and cannot be leveraged by static-type checkers like mypy. See the discussion here for more details.But still, even though there is no static-type checking I think that this is going in a very good direction.

Advertisement

Answer