I’m trying to get a list of dates between two start and end date pandas columns, with individual IDs. I’ve found an answer that is basically what I’m after (https://stackoverflow.com/a/53409207/14463396) but this only works if the periods between the start and end dates don’t overlap, and I can’t seem to work out how to adapt this/use an alternative method to get the output I’m after.
Below is an example dataframe:
JavaScript
x
4
1
df = pd.DataFrame({'ID' : [1, 2, 3],
2
'Start Date': ["2021-06-01", "2021-06-02", "2021-06-04"],
3
'End Date': ["2021-06-03", "2021-06-05", "2021-06-08"]})
4
And this is what the desired output should be:
JavaScript
1
14
14
1
ID Start Date End Date Dates
2
0 1 2021-06-01 2021-06-03 2021-06-01
3
1 1 2021-06-01 2021-06-03 2021-06-02
4
2 1 2021-06-01 2021-06-03 2021-06-03
5
3 2 2021-06-02 2021-06-05 2021-06-02
6
4 2 2021-06-02 2021-06-05 2021-06-03
7
5 2 2021-06-02 2021-06-05 2021-06-04
8
6 2 2021-06-02 2021-06-05 2021-06-05
9
7 3 2021-06-04 2021-06-08 2021-06-04
10
8 3 2021-06-04 2021-06-08 2021-06-05
11
9 3 2021-06-04 2021-06-08 2021-06-06
12
10 3 2021-06-04 2021-06-08 2021-06-07
13
11 3 2021-06-04 2021-06-08 2021-06-08
14
Any help greatly appreciated :)
Advertisement
Answer
Try with create the date list then explode
JavaScript
1
20
20
1
df['Start Date'] = pd.to_datetime(df['Start Date'])
2
df['End Date'] = pd.to_datetime(df['End Date'])
3
df['Dates'] = [pd.date_range(x, y) for x , y in zip(df['Start Date'],df['End Date'])]
4
df = df.explode('Dates')
5
df
6
Out[108]:
7
ID Start Date End Date Dates
8
0 1 2021-06-01 2021-06-03 2021-06-01
9
0 1 2021-06-01 2021-06-03 2021-06-02
10
0 1 2021-06-01 2021-06-03 2021-06-03
11
1 2 2021-06-02 2021-06-05 2021-06-02
12
1 2 2021-06-02 2021-06-05 2021-06-03
13
1 2 2021-06-02 2021-06-05 2021-06-04
14
1 2 2021-06-02 2021-06-05 2021-06-05
15
2 3 2021-06-04 2021-06-08 2021-06-04
16
2 3 2021-06-04 2021-06-08 2021-06-05
17
2 3 2021-06-04 2021-06-08 2021-06-06
18
2 3 2021-06-04 2021-06-08 2021-06-07
19
2 3 2021-06-04 2021-06-08 2021-06-08
20