Error comparing dask date month with an integer

Question

The dask map_partitions function in the code below has a dask date field where its month is compared to an integer. This comparison fails with the following error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). What is this error and how to fix it? Answer By using .map_partition, each dask dataframe

Accepted Answer

By using .map_partition, each dask dataframe partition (which is a pandas dataframe) is passed to the function func2. As a result, obj.date2.dt.month refers to a Series, not a single value, so by running the comparison with the integer, it&#8217;s not clear to Python whether how to determine the validity of the comparison.As one option, below is a snippet that creates a new column, conditional on dt.month result:import pandas as pdimport daskimport dask.dataframe as ddimport datetimepdf = pd.DataFrame({    'id2': [1, 1, 1, 2, 2],    'balance': [150, 140, 130, 280, 260],    'date2' : [datetime.datetime(2021,3,1), datetime.datetime(2021,4,1),                datetime.datetime(2021,5,1), datetime.datetime(2021,1,1),                datetime.datetime(2021,2,1)]})ddf = dd.from_pandas(pdf, npartitions=1) def func2(obj):    m = obj.date2.dt.month    obj.loc[m>10, 'new_int']=1    obj.loc[m<=10, 'new_int']=2    return objddf2 = ddf.map_partitions(func2)ddf2.compute()

Advertisement

Answer