Skip to content
Advertisement

How to improve performance of dataframe slices matching?

I need to improve the performance of the following dataframe slices matching. What I need to do is find the matching trips between 2 dataframes, according to the sequence column values with order conserved.

My 2 dataframes:

JavaScript

Expected output:

JavaScript

This is the following code I’ m using:

JavaScript

Despite working, this is very time costly and unefficient as my real dataframes are longer. Any suggestions?

Advertisement

Answer

You can aggregate each trip as tuple with groupby.agg, then merge the two outputs to identify the identical routes:

JavaScript

output:

JavaScript

If you only want the first match, drop_duplicates the output of df2 aggregation to prevent unnecessary merging:

JavaScript

output:

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement