I have time series of y per store and product stored in the following dataframe:
JavaScript
x
14
14
1
ds store product y
2
0 2016-01-01 a salt 2
3
1 2016-01-02 a salt 5
4
2 2016-01-04 a salt 3
5
3 2016-01-05 a salt 3
6
4 2016-01-06 a salt 4
7
5 2016-01-07 a salt 3
8
6 2016-01-01 b pepper 2
9
7 2016-01-02 b pepper 2
10
8 2016-01-03 b pepper 1
11
9 2016-01-04 b pepper 2
12
10 2016-01-06 b pepper 4
13
11 2016-01-07 b pepper 2
14
I would like to output all the missing dates per store, product and return the following result:
JavaScript
1
4
1
ds store product
2
0 2016-01-03 a salt
3
1 2016-01-05 b pepper
4
Advertisement
Answer
Use groupby_resample
:
JavaScript
1
11
11
1
# Assuming ds is datetime64 else use:
2
# df['ds'] = pd.to_datetime(df['ds'])
3
out = df.groupby(['store', 'product']).resample('D', on='ds')['y']
4
.first().loc[lambda x: x.isna()].index.to_frame(index=False)
5
print(out)
6
7
# Output
8
store product ds
9
0 a salt 2016-01-03
10
1 b pepper 2016-01-05
11
Details:
JavaScript
1
18
18
1
>>> df.groupby(['store', 'product']).resample('D', on='ds')['y'].first()
2
store product ds
3
a salt 2016-01-01 2.0
4
2016-01-02 5.0
5
2016-01-03 NaN # <- missing value == missing date
6
2016-01-04 3.0
7
2016-01-05 3.0
8
2016-01-06 4.0
9
2016-01-07 3.0
10
b pepper 2016-01-01 2.0
11
2016-01-02 2.0
12
2016-01-03 1.0
13
2016-01-04 2.0
14
2016-01-05 NaN # <- missing value == missing date
15
2016-01-06 4.0
16
2016-01-07 2.0
17
Name: y, dtype: float64
18
Update: If you have a date in the ds
column without a value in the y
column, just use fillna({'y': 0})
before groupby_resample