I have the following dataframe:
| id | date_start | date_end | reporting_month | reporting_month_number | months_length | 
|---|---|---|---|---|---|
| 1 | 2022-03-31 23:56:22 | 2022-05-01 23:56:22 | 2022-03 | 1 | 3 | 
| 2 | 2022-03-31 23:48:48 | 2022-06-01 23:48:48 | 2022-03 | 1 | 4 | 
| 3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-03 | 1 | 6 | 
I would like to split each id row so I can have a row for each of the months_length, starting on the date of reporting_month, like this:
| id | date_start | date_end | reporting_month | reporting_month_number | months_length | 
|---|---|---|---|---|---|
| 1 | 2022-03-31 23:56:22 | 2022-05-01 23:56:22 | 2022-03 | 1 | 3 | 
| 1 | 2022-03-31 23:56:22 | 2022-05-01 23:56:22 | 2022-04 | 2 | 3 | 
| 1 | 2022-03-31 23:56:22 | 2022-05-01 23:56:22 | 2022-05 | 3 | 3 | 
| 2 | 2022-03-31 23:48:48 | 2022-06-01 23:48:48 | 2022-03 | 1 | 4 | 
| 2 | 2022-03-31 23:48:48 | 2022-06-01 23:48:48 | 2022-03 | 2 | 4 | 
| 2 | 2022-03-31 23:48:48 | 2022-06-01 23:48:48 | 2022-04 | 3 | 4 | 
| 2 | 2022-03-31 23:48:48 | 2022-06-01 23:48:48 | 2022-05 | 4 | 4 | 
| 3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-03 | 1 | 6 | 
| 3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-04 | 2 | 6 | 
| 3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-05 | 3 | 6 | 
| 3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-06 | 4 | 6 | 
| 3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-07 | 5 | 6 | 
| 3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-08 | 6 | 6 | 
I have tried several approaches but I can’t seem to reach my objective.
Does anyone have a suggestion on how to achieve this?
Thanks.
Advertisement
Answer
One possible solution is,
df= df.loc[df.index.repeat(df['months_length'])].reset_index(drop=True)
df['reporting_month_number'] = df.groupby('id')['reporting_month_number'].cumsum()
O/P:
    id           date_start             date_end reporting_month  
0    1  2022-03-31 23:56:22  2022-05-01 23:56:22         2022-03   
1    1  2022-03-31 23:56:22  2022-05-01 23:56:22         2022-03   
2    1  2022-03-31 23:56:22  2022-05-01 23:56:22         2022-03   
3    2  2022-03-31 23:48:48  2022-06-01 23:48:48         2022-03   
4    2  2022-03-31 23:48:48  2022-06-01 23:48:48         2022-03   
5    2  2022-03-31 23:48:48  2022-06-01 23:48:48         2022-03   
6    2  2022-03-31 23:48:48  2022-06-01 23:48:48         2022-03   
7    3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   
8    3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   
9    3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   
10   3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   
11   3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   
12   3  2022-03-31 23:47:36  2022-08-01 23:47:36         2022-03   
    reporting_month_number  months_length  
0                        1              3  
1                        2              3  
2                        3              3  
3                        1              4  
4                        2              4  
5                        3              4  
6                        4              4  
7                        1              6  
8                        2              6  
9                        3              6  
10                       4              6  
11                       5              6  
12                       6              6  
 Explanation:
- Repeat rows based on months_length
- Update Reporing Month Number based on groupby ‘id’