Skip to content
Advertisement

Filling empty months in pandas dataframe not working

I have a pandas DataFrame exclusively with dates:

     year month day     date
0    2019    01  17  01-2019
1    2019    01  17  01-2019
2    2019    01  18  01-2019
3    2019    01  19  01-2019
4    2019    01  20  01-2019
...
336  2021    12  31  12-2021
337  2022    03  02  03-2022
338  2022    03  05  03-2022
339  2022    05  25  05-2022
340  2022    06  09  06-2022

Using groupby I get a count for the number of monthly occurrences as seen below:

gh =df.groupby([df['year'], df['month']]).agg({'count'}).reset_index()
print(gh)

>>>  year month   day  date
               count count
0   2019    01    10    10
1   2019    02     6     6
2   2019    03     8     8
3   2019    04     2     2
4   2019    05     7     7
5   2019    06     8     8
6   2019    07    10    10
7   2019    08     6     6
8   2019    09     6     6
9   2019    10     6     6
10  2019    11     4     4
11  2019    12     3     3
12  2020    01    12    12
13  2020    02     6     6
14  2020    03    22    22
15  2020    04    17    17
16  2020    05     4     4
17  2020    06     9     9
18  2020    07     6     6
19  2020    08     8     8
20  2020    09     4     4
21  2020    10     7     7
22  2020    11    15    15
23  2020    12    15    15
24  2021    01    18    18
25  2021    02    22    22
26  2021    03    15    15
27  2021    04    19    19
28  2021    05    16    16
29  2021    06    23    23
30  2021    07    19    19
31  2021    08     1     1
32  2021    12     3     3
33  2022    03     2     2
34  2022    05     1     1
35  2022    06     1     1

(date is only used for plotting reasons). My issue is, come 09-2021 I have zero monthly counts and I want to obtain my gh dataframe such that the missing rows look something like:

31  2021    08     1     1
32  2021    09     0     0 
33  2021    10     0     0 
34  2021    11     0     0 
35  2021    12     0     0 
...

All the way through 06-2022.

I encounter errors when I try using gh.reindex(pd.period_range(gh.index[0], gh.index[-1], freq='M')) from this solution, as the monthly indexes repeat. I also think because I’m only working with dates as my data and not actual variables, this is messing things up, but i am trying to plot each month from 01/2019 – 06/2022 including the 0 counts for months in late 2021 and early 2022. How do I make this work?

I know there are a lot of threads similar to this, but they all use actual data counts, and not just date counts.

Edit: Here is the output from df.info():

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 341 entries, 0 to 340
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   year    341 non-null    object
 1   month   341 non-null    object
 2   day     341 non-null    object
 3   date    341 non-null    object
dtypes: object(4)
memory usage: 10.8+ KB

Advertisement

Answer

Create a MultiIndex then reindex after groupby_agg:

mi = pd.MultiIndex.from_product([df['year'].unique(), range(1, 13)],
                                names=['year', 'month'])

gh = (df.groupby([df['year'], df['month']]).agg({'count'})
        .reindex(mi, fill_value=0).reset_index()
        .droplevel(level=1, axis=1)))

Output:

>>> gh
    year  month  day  date
0   2019      1    5     5
1   2019      2    0     0
2   2019      3    0     0
3   2019      4    0     0
4   2019      5    0     0
5   2019      6    0     0
6   2019      7    0     0
7   2019      8    0     0
8   2019      9    0     0
9   2019     10    0     0
10  2019     11    0     0
11  2019     12    0     0
12  2021      1    0     0
13  2021      2    0     0
14  2021      3    0     0
15  2021      4    0     0
16  2021      5    0     0
17  2021      6    0     0
18  2021      7    0     0
19  2021      8    0     0
20  2021      9    0     0
21  2021     10    0     0
22  2021     11    0     0
23  2021     12    1     1
24  2022      1    0     0
25  2022      2    0     0
26  2022      3    2     2
27  2022      4    0     0
28  2022      5    1     1
29  2022      6    1     1
30  2022      7    0     0
31  2022      8    0     0
32  2022      9    0     0
33  2022     10    0     0
34  2022     11    0     0
35  2022     12    0     0
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement