I want to add missing dates for a specific date range, but keep all columns. I found many posts using afreq()
, resample()
, reindex()
, but they seemed to be for Series and I couldn’t get them to work for my DataFrame.
Given a sample dataframe:
JavaScript
x
10
10
1
data = [{'id' : '123', 'product' : 'apple', 'color' : 'red', 'qty' : 10, 'week' : '2019-3-7'}, {'id' : '123', 'product' : 'apple', 'color' : 'blue', 'qty' : 20, 'week' : '2019-3-21'}, {'id' : '123', 'product' : 'orange', 'color' : 'orange', 'qty' : 8, 'week' : '2019-3-21'}]
2
3
df = pd.DataFrame(data)
4
5
6
color id product qty week
7
0 red 123 apple 10 2019-3-7
8
1 blue 123 apple 20 2019-3-21
9
2 orange 123 orange 8 2019-3-21
10
My goal is to return below; filling in qty as 0, but fill other columns. Of course, I have many other ids. I would like to be able to specify the start/end dates to fill; this example uses 3/7 to 3/21.
JavaScript
1
11
11
1
color id product qty week
2
0 red 123 apple 10 2019-3-7
3
1 blue 123 apple 20 2019-3-21
4
2 orange 123 orange 8 2019-3-21
5
3 red 123 apple 0 2019-3-14
6
4 red 123 apple 0 2019-3-21
7
5 blue 123 apple 0 2019-3-7
8
6 blue 123 apple 0 2019-3-14
9
7 orange 123 orange 0 2019-3-7
10
8 orange 123 orange 0 2019-3-14
11
How can I keep the remainder of my DataFrame intact?
Advertisement
Answer
In you case , you just need do with unstack
and stack
+ reindex
JavaScript
1
15
15
1
df.week=pd.to_datetime(df.week)
2
s=pd.date_range(df.week.min(),df.week.max(),freq='7 D')
3
4
df=df.set_index(['color','id','product','week']).
5
qty.unstack().reindex(columns=s,fill_value=0).stack().reset_index()
6
df
7
8
color id product level_3 0
9
0 blue 123 apple 2019-03-14 0.0
10
1 blue 123 apple 2019-03-21 20.0
11
2 orange 123 orange 2019-03-14 0.0
12
3 orange 123 orange 2019-03-21 8.0
13
4 red 123 apple 2019-03-07 10.0
14
5 red 123 apple 2019-03-14 0.0
15