Output missing dates by group of columns

Question

I have time series of y per store and product stored in the following dataframe: I would like to output all the missing dates per store, product and return the following result: Answer Use groupby_resample: Details: Update: If you have a date in the ds column without a value in the y column, just use fillna({…

Accepted Answer

Use groupby_resample:# Assuming ds is datetime64 else use:# df['ds'] = pd.to_datetime(df['ds'])out = df.groupby(['store', 'product']).resample('D', on='ds')['y'] .first().loc[lambda x: x.isna()].index.to_frame(index=False)print(out)# Output store product ds0 a salt 2016-01-031 b pepper 2016-01-05Details:>>> df.groupby(['store', 'product']).resample('D', on='ds')['y'].first()store product ds a salt 2016-01-01 2.0 2016-01-02 5.0 2016-01-03 NaN # <- missing value == missing date 2016-01-04 3.0 2016-01-05 3.0 2016-01-06 4.0 2016-01-07 3.0b pepper 2016-01-01 2.0 2016-01-02 2.0 2016-01-03 1.0 2016-01-04 2.0 2016-01-05 NaN # <- missing value == missing date 2016-01-06 4.0 2016-01-07 2.0Name: y, dtype: float64Update: If you have a date in the ds column without a value in the y column, just use fillna({'y': 0}) before groupby_resample

Advertisement

Answer