Skip to content
Advertisement

expand row based on integer in column and split into number of months between dates

I have the following dataframe:

id date_start date_end reporting_month reporting_month_number months_length
1 2022-03-31 23:56:22 2022-05-01 23:56:22 2022-03 1 3
2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-03 1 4
3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03 1 6

I would like to split each id row so I can have a row for each of the months_length, starting on the date of reporting_month, like this:

id date_start date_end reporting_month reporting_month_number months_length
1 2022-03-31 23:56:22 2022-05-01 23:56:22 2022-03 1 3
1 2022-03-31 23:56:22 2022-05-01 23:56:22 2022-04 2 3
1 2022-03-31 23:56:22 2022-05-01 23:56:22 2022-05 3 3
2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-03 1 4
2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-03 2 4
2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-04 3 4
2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-05 4 4
3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03 1 6
3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-04 2 6
3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-05 3 6
3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-06 4 6
3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-07 5 6
3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-08 6 6

I have tried several approaches but I can’t seem to reach my objective.

Does anyone have a suggestion on how to achieve this?

Thanks.

Advertisement

Answer

One possible solution is,

df= df.loc[df.index.repeat(df['months_length'])].reset_index(drop=True)
df['reporting_month_number'] = df.groupby('id')['reporting_month_number'].cumsum()

O/P:

    id           date_start             date_end reporting_month  
0    1  2022-03-31 23:56:22  2022-05-01 23:56:22         2022-03   
1    1  2022-03-31 23:56:22  2022-05-01 23:56:22         2022-03   
2    1  2022-03-31 23:56:22  2022-05-01 23:56:22         2022-03   
3    2  2022-03-31 23:48:48  2022-06-01 23:48:48         2022-03   
4    2  2022-03-31 23:48:48  2022-06-01 23:48:48         2022-03   
5    2  2022-03-31 23:48:48  2022-06-01 23:48:48         2022-03   
6    2  2022-03-31 23:48:48  2022-06-01 23:48:48         2022-03   
7    3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   
8    3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   
9    3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   
10   3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   
11   3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   
12   3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   

    reporting_month_number  months_length  
0                        1              3  
1                        2              3  
2                        3              3  
3                        1              4  
4                        2              4  
5                        3              4  
6                        4              4  
7                        1              6  
8                        2              6  
9                        3              6  
10                       4              6  
11                       5              6  
12                       6              6  

​ Explanation:

  1. Repeat rows based on months_length
  2. Update Reporing Month Number based on groupby ‘id’
Advertisement