I have the following dataframe:
id | date_start | date_end | reporting_month | reporting_month_number | months_length |
---|---|---|---|---|---|
1 | 2022-03-31 23:56:22 | 2022-05-01 23:56:22 | 2022-03 | 1 | 3 |
2 | 2022-03-31 23:48:48 | 2022-06-01 23:48:48 | 2022-03 | 1 | 4 |
3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-03 | 1 | 6 |
I would like to split each id row so I can have a row for each of the months_length
, starting on the date of reporting_month
, like this:
id | date_start | date_end | reporting_month | reporting_month_number | months_length |
---|---|---|---|---|---|
1 | 2022-03-31 23:56:22 | 2022-05-01 23:56:22 | 2022-03 | 1 | 3 |
1 | 2022-03-31 23:56:22 | 2022-05-01 23:56:22 | 2022-04 | 2 | 3 |
1 | 2022-03-31 23:56:22 | 2022-05-01 23:56:22 | 2022-05 | 3 | 3 |
2 | 2022-03-31 23:48:48 | 2022-06-01 23:48:48 | 2022-03 | 1 | 4 |
2 | 2022-03-31 23:48:48 | 2022-06-01 23:48:48 | 2022-03 | 2 | 4 |
2 | 2022-03-31 23:48:48 | 2022-06-01 23:48:48 | 2022-04 | 3 | 4 |
2 | 2022-03-31 23:48:48 | 2022-06-01 23:48:48 | 2022-05 | 4 | 4 |
3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-03 | 1 | 6 |
3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-04 | 2 | 6 |
3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-05 | 3 | 6 |
3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-06 | 4 | 6 |
3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-07 | 5 | 6 |
3 | 2022-03-31 23:47:36 | 2022-08-01 23:47:36 | 2022-08 | 6 | 6 |
I have tried several approaches but I can’t seem to reach my objective.
Does anyone have a suggestion on how to achieve this?
Thanks.
Advertisement
Answer
One possible solution is,
df= df.loc[df.index.repeat(df['months_length'])].reset_index(drop=True) df['reporting_month_number'] = df.groupby('id')['reporting_month_number'].cumsum()
O/P:
id date_start date_end reporting_month 0 1 2022-03-31 23:56:22 2022-05-01 23:56:22 2022-03 1 1 2022-03-31 23:56:22 2022-05-01 23:56:22 2022-03 2 1 2022-03-31 23:56:22 2022-05-01 23:56:22 2022-03 3 2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-03 4 2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-03 5 2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-03 6 2 2022-03-31 23:48:48 2022-06-01 23:48:48 2022-03 7 3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03 8 3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03 9 3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03 10 3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03 11 3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03 12 3 2022-03-31 23:47:36 2022-08-01 23:47:36 2022-03 reporting_month_number months_length 0 1 3 1 2 3 2 3 3 3 1 4 4 2 4 5 3 4 6 4 4 7 1 6 8 2 6 9 3 6 10 4 6 11 5 6 12 6 6
Explanation:
- Repeat rows based on months_length
- Update Reporing Month Number based on groupby ‘id’