My data set is much larger so I have simplified it.
I want to convert the dataframe into a time-series.
The bit I am stuck on:
I have overlapping date ranges, where I have a smaller date range inside a larger one, as shown by row 0 and row 1, where row 1 and row 2 are inside the date range of row 0.
JavaScript
x
7
1
df:
2
date1 date2 reduction
3
0 2016-01-01 - 2016-01-05 7.0
4
1 2016-01-02 - 2016-01-03 5.0
5
2 2016-01-03 - 2016-01-04 6.0
6
3 2016-01-05 - 2016-01-12 10.0
7
How I want the output to look:
JavaScript
1
13
13
1
date1 date2 reduction
2
0 2016-01-01 2016-01-02 7.0
3
1 2016-01-02 2016-01-03 5.0
4
2 2016-01-03 2016-01-04 6.0
5
3 2016-01-04 2016-01-05 7.0
6
4 2016-01-05 2016-01-06 10.0
7
5 2016-01-06 2016-01-07 10.0
8
6 2016-01-07 2016-01-08 10.0
9
7 2016-01-08 2016-01-09 10.0
10
8 2016-01-09 2016-01-10 10.0
11
9 2016-01-10 2016-01-11 10.0
12
10 2016-01-11 2016-01-12 10.0
13
Advertisement
Answer
I think this does what you want…
JavaScript
1
19
19
1
import pandas as pd
2
import datetime
3
first={'date1':[datetime.date(2016,1,1),datetime.date(2016,1,2),datetime.date(2016,1,6),datetime.date(2016,1,7),
4
datetime.date(2016,1,8),datetime.date(2016,1,9),datetime.date(2016,1,10),datetime.date(2016,1,11)],
5
'date2':[datetime.date(2016,1,5),datetime.date(2016,1,3),datetime.date(2016,1,7),datetime.date(2016,1,8),
6
datetime.date(2016,1,9),datetime.date(2016,1,10),datetime.date(2016,1,11),datetime.date(2016,1,12)],
7
'reduction':[7,5,3,2,9,3,8,3]}
8
df=pd.DataFrame.from_dict(first)
9
blank = pd.DataFrame(index=pd.date_range(df["date1"].min(), df["date2"].max()))
10
blank["r1"] = blank.join(df[["date1", "reduction"]].set_index("date1"), how="left")["reduction"]
11
blank["r2"] = blank.join(df[["date2", "reduction"]].set_index("date2"), how="left")["reduction"]
12
blank["r2"] = blank["r2"].shift(-1)
13
tmp = blank[pd.notnull(blank).any(axis=1)][pd.isnull(blank).any(axis=1)].reset_index().melt(id_vars=["index"])
14
tmp = tmp.sort_values(by="index").bfill()
15
blank1 = pd.DataFrame(index=pd.date_range(tmp["index"].min(), tmp["index"].max()))
16
tmp = blank1.join(tmp.set_index("index"), how="left").bfill().reset_index().groupby("index")["value"].first()
17
blank["r1"] = blank["r1"].combine_first(blank.join(tmp, how="left")["value"])
18
final = pd.DataFrame(data={"date1": blank.iloc[:-1, :].index, "date2": blank.iloc[1:, :].index, "reduction":blank["r1"].iloc[:-1].fillna(5).values})
19
Output:
JavaScript
1
13
13
1
date1 date2 reduction
2
0 2016-01-01 2016-01-02 7.0
3
1 2016-01-02 2016-01-03 5.0
4
2 2016-01-03 2016-01-04 7.0
5
3 2016-01-04 2016-01-05 7.0
6
4 2016-01-05 2016-01-06 5.0
7
5 2016-01-06 2016-01-07 3.0
8
6 2016-01-07 2016-01-08 2.0
9
7 2016-01-08 2016-01-09 9.0
10
8 2016-01-09 2016-01-10 3.0
11
9 2016-01-10 2016-01-11 8.0
12
10 2016-01-11 2016-01-12 3.0
13