I have a dataframe (mydf) with dates for each group in monthly frequency like below:
JavaScript
x
9
1
Dt Id Sales
2
2021-03-01 B 2
3
2021-04-01 B 42
4
2021-05-01 B 20
5
2021-06-01 B 4
6
2020-10-01 A 47
7
2020-11-01 A 67
8
2020-12-01 A 46
9
I want to fill the dt for each group till the Maximum date within the date column starting from the date of Id while simultaneously filling in 0 for the Sales column. So each group starts at their own start date but ends at the same end date.
So for e.g. ID=A will start from 2020-10-01 and go all the way to 2021-06-03 and the value for the filled dates will be 0.
So the output will be
JavaScript
1
15
15
1
Dt Id Sales
2
2021-03-01 B 2
3
2021-04-01 B 42
4
2021-05-01 B 20
5
2021-06-01 B 4
6
2020-10-01 A 46
7
2020-11-01 A 47
8
2020-12-01 A 67
9
2021-01-01 A 0
10
2021-02-01 A 0
11
2021-03-01 A 0
12
2021-04-01 A 0
13
2021-05-01 A 0
14
2021-06-01 A 0
15
I have tried reindex but instead of adding daterange manually I want to use the dates in the groups.
My code is :
JavaScript
1
5
1
f = lambda x: x.reindex(pd.date_range('2020-10-01', '2021-06-01', freq='MS', name='Dt'))
2
mydf = mydf.set_index('Dt').groupby('Id').apply(f).drop('Id', axis=1).fillna(0)
3
mydf = mydf.reset_index()
4
5
Advertisement
Answer
Let’s try:
- Getting the minimum value per group using
groupby.min
- Add a new column to the aggregated mins called
max
which stores the maximum values from the frame usingSeries.max
onDt
- Create individual
date_range
per group based on themin
andmax
values Series.explode
into rows to have a DataFrame that represents the new index.- Create a
MultiIndex.from_frame
toreindex
the DataFrame with. reindex
withmidx
and set thefillvalue=0
JavaScript
1
19
19
1
# Get Min Per Group
2
dates = mydf.groupby('Id')['Dt'].min().to_frame(name='min')
3
# Get max from Frame
4
dates['max'] = mydf['Dt'].max()
5
6
# Create MultiIndex with separate Date ranges per Group
7
midx = pd.MultiIndex.from_frame(
8
dates.apply(
9
lambda x: pd.date_range(x['min'], x['max'], freq='MS'), axis=1
10
).explode().reset_index(name='Dt')[['Dt', 'Id']]
11
)
12
13
# Reindex
14
mydf = (
15
mydf.set_index(['Dt', 'Id'])
16
.reindex(midx, fill_value=0)
17
.reset_index()
18
)
19
mydf
:
JavaScript
1
15
15
1
Dt Id Sales
2
0 2020-10-01 A 47
3
1 2020-11-01 A 67
4
2 2020-12-01 A 46
5
3 2021-01-01 A 0
6
4 2021-02-01 A 0
7
5 2021-03-01 A 0
8
6 2021-04-01 A 0
9
7 2021-05-01 A 0
10
8 2021-06-01 A 0
11
9 2021-03-01 B 2
12
10 2021-04-01 B 42
13
11 2021-05-01 B 20
14
12 2021-06-01 B 4
15
DataFrame:
JavaScript
1
10
10
1
import pandas as pd
2
3
mydf = pd.DataFrame({
4
'Dt': ['2021-03-01', '2021-04-01', '2021-05-01', '2021-06-01', '2020-10-01',
5
'2020-11-01', '2020-12-01'],
6
'Id': ['B', 'B', 'B', 'B', 'A', 'A', 'A'],
7
'Sales': [2, 42, 20, 4, 47, 67, 46]
8
})
9
mydf['Dt'] = pd.to_datetime(mydf['Dt'])
10