Pandas fill missing dates and values simultaneously for each group

Question

I have a dataframe (mydf) with dates for each group in monthly frequency like below: I want to fill the dt for each group till the Maximum date within the date column starting from the date of Id while simultaneously filling in 0 for the Sales column. So each group starts at their own start date but ends at the

Accepted Answer

Let&#8217;s try:Getting the minimum value per group using groupby.minAdd a new column to the aggregated mins called max which stores the maximum values from the frame using Series.max on DtCreate individual date_range per group based on the min and max valuesSeries.explode into rows to have a DataFrame that represents the new index.Create a MultiIndex.from_frame to reindex the DataFrame with.reindex with midx and set the fillvalue=0# Get Min Per Groupdates = mydf.groupby('Id')['Dt'].min().to_frame(name='min')# Get max from Framedates['max'] = mydf['Dt'].max()# Create MultiIndex with separate Date ranges per Groupmidx = pd.MultiIndex.from_frame(    dates.apply(        lambda x: pd.date_range(x['min'], x['max'], freq='MS'), axis=1    ).explode().reset_index(name='Dt')[['Dt', 'Id']])# Reindexmydf = (    mydf.set_index(['Dt', 'Id'])        .reindex(midx, fill_value=0)        .reset_index())mydf:           Dt Id  Sales0  2020-10-01  A     471  2020-11-01  A     672  2020-12-01  A     463  2021-01-01  A      04  2021-02-01  A      05  2021-03-01  A      06  2021-04-01  A      07  2021-05-01  A      08  2021-06-01  A      09  2021-03-01  B      210 2021-04-01  B     4211 2021-05-01  B     2012 2021-06-01  B      4DataFrame:import pandas as pdmydf = pd.DataFrame({    'Dt': ['2021-03-01', '2021-04-01', '2021-05-01', '2021-06-01', '2020-10-01',           '2020-11-01', '2020-12-01'],    'Id': ['B', 'B', 'B', 'B', 'A', 'A', 'A'],    'Sales': [2, 42, 20, 4, 47, 67, 46]})mydf['Dt'] = pd.to_datetime(mydf['Dt'])

Advertisement

Answer