Merging two dataframes on timestamp while preserving all data

Question

I want to merge two dataframes to create a single time-series with two variables. I have a function that does this by iterating over each dataframe using itterows()... which is terribly slow and doesn't take advantage of the vectorization that pandas and numpy provide... Would you be able to help? This code illustrates what I am trying to do: Answer

Accepted Answer

This can be broken down into 2 steps:The first step is the equivalent of an outer join in SQL, where create a table containing keys of both source tables. This is done with merge(..., how="outer")The second is filling the NaN with the previous non-NaN values, which can done with ffillz = a.merge(b, on="timestamp", how="outer").sort_values("timestamp").ffill()

Advertisement

Answer