Skip to content
Advertisement

Check if value in pandas dataframe is within any two values of two other columns in another dataframe

I have two dataframes of different length. dfSamples (63012375 rows) and dfFixations (200000 rows).

JavaScript

I would like to check each value in dfSamples if it is within any two ranges given in dfFixations and then assign a label to this value. I have found this: Check if value in a dataframe is between two values in another dataframe, but the loop solution is terribly slow and I cannot make any other solution work.

Working (but very slow) example:

JavaScript

Following this example: Performance of Pandas apply vs np.vectorize to create new column from existing columns I have tried to vectorize this but with no success.

JavaScript

Would appreciate any help!

Advertisement

Answer

Use IntervalIndex.from_arrays with IntervalIndex.get_indexer , if not match is returned -1, so checked and set ouput in numpy.where:

JavaScript

Performance: In ideal nice sorted not overlap data, in real should be performance different, the best test it.

JavaScript

JavaScript
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement