I have two dataframes. The first one is just a column of daily datetime, whereas the second one has both dates and data. This is an example:
df1 = pd.DataFrame({'Date': pd.date_range(start="2002-01-01",end="2021-04-14")})
df2 = pd.DataFrame.from_dict({'Date1': {0: '2002-01-01 00:00:00',
1: '2002-01-02 00:00:00',
2: '2002-01-03 00:00:00',
3: '2002-01-04 00:00:00',
4: '2002-01-07 00:00:00',
5: '2002-01-08 00:00:00',
6: '2002-01-09 00:00:00',
7: '2002-01-10 00:00:00',
8: '2002-01-11 00:00:00',
9: '2002-01-14 00:00:00',
10: '2002-01-15 00:00:00',
11: '2002-01-16 00:00:00',
12: '2002-01-17 00:00:00',
13: '2002-01-18 00:00:00',
14: '2002-01-21 00:00:00'},
'GSEAFCI': {0: 98.85,
1: 98.96,
2: 98.88,
3: 98.85,
4: 98.83,
5: 98.86,
6: 98.87,
7: 98.9,
8: 98.87,
9: 98.91,
10: 98.84,
11: 98.87,
12: 98.84,
13: 98.84,
14: 98.87},
'Date2': {0: '2004-03-03 00:00:00',
1: '2004-03-04 00:00:00',
2: '2004-03-05 00:00:00',
3: '2004-03-08 00:00:00',
4: '2004-03-09 00:00:00',
5: '2004-03-10 00:00:00',
6: '2004-03-11 00:00:00',
7: '2004-03-12 00:00:00',
8: '2004-03-15 00:00:00',
9: '2004-03-16 00:00:00',
10: '2004-03-17 00:00:00',
11: '2004-03-18 00:00:00',
12: '2004-03-19 00:00:00',
13: '2004-03-22 00:00:00',
14: '2004-03-23 00:00:00'},
'2Y2YINF': {0: 3.9,
1: 3.832,
2: 3.668,
3: 3.63,
4: 3.649,
5: 3.636,
6: 3.63,
7: 3.633,
8: 3.628,
9: 3.665,
10: 3.621,
11: 3.62,
12: 3.616,
13: 3.588,
14: 3.5949999999999998},
'Date3': {0: '2002-01-02 00:00:00',
1: '2002-01-03 00:00:00',
2: '2002-01-04 00:00:00',
3: '2002-01-07 00:00:00',
4: '2002-01-08 00:00:00',
5: '2002-01-09 00:00:00',
6: '2002-01-10 00:00:00',
7: '2002-01-11 00:00:00',
8: '2002-01-14 00:00:00',
9: '2002-01-15 00:00:00',
10: '2002-01-16 00:00:00',
11: '2002-01-17 00:00:00',
12: '2002-01-18 00:00:00',
13: '2002-01-21 00:00:00',
14: '2002-01-22 00:00:00'},
'5Y5YINF': {0: 5.9274000000000004,
1: 5.8767,
2: 5.8977,
3: 5.763,
4: 5.8056,
5: 5.825,
6: 5.7287,
7: 5.6922,
8: 5.6222,
9: 5.6145,
10: 5.6655999999999995,
11: 5.76,
12: 5.7455,
13: 5.718,
14: 5.7589}})
What I want to do is to merge df1 and df2 to get a dataframe (dataset) where: when the data exist it takes the date position; when it doesn’t exist, it just gets NaN. This is what I did:
dataset = df1.merge(df2, how = 'left', left_on= ['Date', 'Date','Date'],
right_on=['Date1','Date2','Date3'])
Unfortunately, I get only NaNs or NaTs everywhere.
Instead, I would like to get:
Date GSEAFCI 2Y2YINF 5Y5YINF 0 2002-01-01 00:00:00 98.85 NaN 5.9274 1 2002-01-02 00:00:00 98.96 NaN 5.8767 2 2002-01-03 00:00:00 98.88 NaN 5.8977 3 2002-01-04 00:00:00 98.85 NaN 5.7630 4 2002-01-05 00:00:00 NaN NaN NaN 5 2002-01-06 00:00:00 NaN NaN NaN 6 2002-01-07 00:00:00 98.83 NaN 5.8056 7 2002-01-08 00:00:00 98.86 NaN 5.8250 8 2002-01-09 00:00:00 98.87 NaN 5.7287 ... ...
Can anyone help me with this?
Thanks!
Advertisement
Answer
If there si same pattern – datetimes with data columns is possible use list comprehension with concat, then convert values to datetimes in index and DataFrame.merge by first DataFrame (by index name Date and column name Date):
a = df2.columns[::2]
b = df2.columns[1::2]
dfs = [df2.loc[:, x].set_index(x[0], drop=False)[x[1]] for x in zip(a, b)]
df = pd.concat(dfs, axis=1, sort=False).rename_axis('Date')
df.index = pd.to_datetime(df.index)
df = df1.merge(df, on='Date', how='left')
print (df.head(10))
Date GSEAFCI 2Y2YINF 5Y5YINF
0 2002-01-01 98.85 NaN NaN
1 2002-01-02 98.96 NaN 5.9274
2 2002-01-03 98.88 NaN 5.8767
3 2002-01-04 98.85 NaN 5.8977
4 2002-01-05 NaN NaN NaN
5 2002-01-06 NaN NaN NaN
6 2002-01-07 98.83 NaN 5.7630
7 2002-01-08 98.86 NaN 5.8056
8 2002-01-09 98.87 NaN 5.8250
9 2002-01-10 98.90 NaN 5.7287