Skip to content
Advertisement

Dask Df convert All Dtype using dictionary

ddf = dd.read_csv("data/csvs/*.part", dtype=better_dtypes)

Is there an easy equivalent way to convert all columns in a dask df(converted from a pandas df) using a dictionary. I have a dictionary as follows:

better_dtypes = {
    "id1": "string[pyarrow]",
    "id2": "string[pyarrow]",
    "id3": "string[pyarrow]",
    "id4": "int64",
    "id5": "int64",
    "id6": "int64",
    "v1": "int64",
    "v2": "int64",
    "v3": "float64",
}

and would like to convert the pandas|dask df dtypes all at once to the suggested dtypes in the dictionary.

ddf = ddf.astype(better_dtypes).dtypes

Advertisement

Answer

Not sure if I understand the question correctly, but the conversion of dtypes can be done using .astype (as you wrote), except you would want to remove .dtype from the assignment:

# this will store the converted ddf
ddf = ddf.astype(better_dtypes)
Advertisement