I have two dataframes. The first one is just a column of daily datetime, whereas the second one has both dates and data. This is an example:
df1 = pd.DataFrame({'Date': pd.date_range(start="2002-01-01",end="2021-04-14")}) df2 = pd.DataFrame.from_dict({'Date1': {0: '2002-01-01 00:00:00', 1: '2002-01-02 00:00:00', 2: '2002-01-03 00:00:00', 3: '2002-01-04 00:00:00', 4: '2002-01-07 00:00:00', 5: '2002-01-08 00:00:00', 6: '2002-01-09 00:00:00', 7: '2002-01-10 00:00:00', 8: '2002-01-11 00:00:00', 9: '2002-01-14 00:00:00', 10: '2002-01-15 00:00:00', 11: '2002-01-16 00:00:00', 12: '2002-01-17 00:00:00', 13: '2002-01-18 00:00:00', 14: '2002-01-21 00:00:00'}, 'GSEAFCI': {0: 98.85, 1: 98.96, 2: 98.88, 3: 98.85, 4: 98.83, 5: 98.86, 6: 98.87, 7: 98.9, 8: 98.87, 9: 98.91, 10: 98.84, 11: 98.87, 12: 98.84, 13: 98.84, 14: 98.87}, 'Date2': {0: '2004-03-03 00:00:00', 1: '2004-03-04 00:00:00', 2: '2004-03-05 00:00:00', 3: '2004-03-08 00:00:00', 4: '2004-03-09 00:00:00', 5: '2004-03-10 00:00:00', 6: '2004-03-11 00:00:00', 7: '2004-03-12 00:00:00', 8: '2004-03-15 00:00:00', 9: '2004-03-16 00:00:00', 10: '2004-03-17 00:00:00', 11: '2004-03-18 00:00:00', 12: '2004-03-19 00:00:00', 13: '2004-03-22 00:00:00', 14: '2004-03-23 00:00:00'}, '2Y2YINF': {0: 3.9, 1: 3.832, 2: 3.668, 3: 3.63, 4: 3.649, 5: 3.636, 6: 3.63, 7: 3.633, 8: 3.628, 9: 3.665, 10: 3.621, 11: 3.62, 12: 3.616, 13: 3.588, 14: 3.5949999999999998}, 'Date3': {0: '2002-01-02 00:00:00', 1: '2002-01-03 00:00:00', 2: '2002-01-04 00:00:00', 3: '2002-01-07 00:00:00', 4: '2002-01-08 00:00:00', 5: '2002-01-09 00:00:00', 6: '2002-01-10 00:00:00', 7: '2002-01-11 00:00:00', 8: '2002-01-14 00:00:00', 9: '2002-01-15 00:00:00', 10: '2002-01-16 00:00:00', 11: '2002-01-17 00:00:00', 12: '2002-01-18 00:00:00', 13: '2002-01-21 00:00:00', 14: '2002-01-22 00:00:00'}, '5Y5YINF': {0: 5.9274000000000004, 1: 5.8767, 2: 5.8977, 3: 5.763, 4: 5.8056, 5: 5.825, 6: 5.7287, 7: 5.6922, 8: 5.6222, 9: 5.6145, 10: 5.6655999999999995, 11: 5.76, 12: 5.7455, 13: 5.718, 14: 5.7589}})
What I want to do is to merge df1
and df2
to get a dataframe (dataset
) where: when the data exist it takes the date position; when it doesn’t exist, it just gets NaN. This is what I did:
dataset = df1.merge(df2, how = 'left', left_on= ['Date', 'Date','Date'], right_on=['Date1','Date2','Date3'])
Unfortunately, I get only NaNs or NaTs everywhere.
Instead, I would like to get:
Date GSEAFCI 2Y2YINF 5Y5YINF 0 2002-01-01 00:00:00 98.85 NaN 5.9274 1 2002-01-02 00:00:00 98.96 NaN 5.8767 2 2002-01-03 00:00:00 98.88 NaN 5.8977 3 2002-01-04 00:00:00 98.85 NaN 5.7630 4 2002-01-05 00:00:00 NaN NaN NaN 5 2002-01-06 00:00:00 NaN NaN NaN 6 2002-01-07 00:00:00 98.83 NaN 5.8056 7 2002-01-08 00:00:00 98.86 NaN 5.8250 8 2002-01-09 00:00:00 98.87 NaN 5.7287 ... ...
Can anyone help me with this?
Thanks!
Advertisement
Answer
If there si same pattern – datetimes with data columns is possible use list comprehension with concat
, then convert values to datetimes in index and DataFrame.merge
by first DataFrame (by index name Date
and column name Date
):
a = df2.columns[::2] b = df2.columns[1::2] dfs = [df2.loc[:, x].set_index(x[0], drop=False)[x[1]] for x in zip(a, b)] df = pd.concat(dfs, axis=1, sort=False).rename_axis('Date') df.index = pd.to_datetime(df.index) df = df1.merge(df, on='Date', how='left') print (df.head(10)) Date GSEAFCI 2Y2YINF 5Y5YINF 0 2002-01-01 98.85 NaN NaN 1 2002-01-02 98.96 NaN 5.9274 2 2002-01-03 98.88 NaN 5.8767 3 2002-01-04 98.85 NaN 5.8977 4 2002-01-05 NaN NaN NaN 5 2002-01-06 NaN NaN NaN 6 2002-01-07 98.83 NaN 5.7630 7 2002-01-08 98.86 NaN 5.8056 8 2002-01-09 98.87 NaN 5.8250 9 2002-01-10 98.90 NaN 5.7287