Compare two DataFrames and find missing timestamps

Question

I have the following two dataframes: and in df2 I have some missing timestamps compared to df1. I am able to find those timestamps using the following code: I want to populate those missing timestamps in df2 and fill in the values of the columns with the average value of the two previous rows. So the new df2 …

Accepted Answer

Not ideal solution via iteration:df1 = [    ['2023-01-01 16:00:00', 100],    ['2023-01-01 16:15:00', 120],    ['2023-01-01 16:30:00', 140],    ['2023-01-01 16:45:00', 160],    ['2023-01-01 17:00:00', 200],    ['2023-01-01 17:15:00', 430],    ['2023-01-01 17:30:00', 890],]df2 = [    ['2023-01-01 16:00:00', 100,  200],    ['2023-01-01 16:15:00', 120,  400],    ['2023-01-01 17:00:00', 200,  500],]df1= pd.DataFrame(df1, columns = ['date', 'col1'])df2= pd.DataFrame(df2, columns = ['date', 'col2', 'col3'])missing = df1[~df1['date'].isin(df2['date'])]missing = missing.drop(['col1'], axis=1)merged = pd.concat([df2, missing])merged.sort_values('date', inplace=True, ignore_index=True)for index, row in merged.iterrows():    if np.isnan(row['col2']):        merged['col2'].at[index] = merged['col2'].iloc[[index-1, index-2]].mean()    if np.isnan(row['col3']):        merged['col3'].at[index] = merged['col3'].iloc[[index-1, index-2]].mean()print(merged)Output:datecol2col32023-01-01 16:00:00100.00200.02023-01-01 16:15:00120.00400.02023-01-01 16:30:00110.00300.02023-01-01 16:45:00115.00350.02023-01-01 17:00:00200.00500.02023-01-01 17:15:00157.50425.02023-01-01 17:30:00178.75462.5

date	col2	col3
2023-01-01 16:00:00	100.00	200.0
2023-01-01 16:15:00	120.00	400.0
2023-01-01 16:30:00	110.00	300.0
2023-01-01 16:45:00	115.00	350.0
2023-01-01 17:00:00	200.00	500.0
2023-01-01 17:15:00	157.50	425.0
2023-01-01 17:30:00	178.75	462.5

Advertisement

Answer