Skip to content
Advertisement

python a faster method of finding indexes in a list of 2million+ data that match string condition

JavaScript

Hello, I have shared the reproducible code above.

Background: Let me quickly introduce my data station_combinations is the cross product of “my_list” separated by the notation “_”. You can think of it as a destination among “my_list” items so 1_2 would be going from 1 to 2 whereas 2_1 would going from 2 to 1.

So I will refer as “a_b” Among all the combinations in “reverse_indexes”, I am trying to find the index of elements where b in ( “a_b” ) is equal to “station_name”, so the “destination” is equal to station name, and in the regular_indexes an in (“a_b”) the source is equal to the station_name

Problem: The code that I have works however it is very slow. if you look at the for loop (with cursor h) I iterate 10 times, however, in the original code, it is supposed to be approx. 2000. With even 10 iterations it approx. takes 8seconds on my computer. I am looking for ways to improve the speed significantly. I have tried the library numba, however because I actually get some of the data from a data frame I wasn’t able to work it out with the “@njit” functionality. Would anyone be able to help?

Advertisement

Answer

One solution can be using indexes, in this case two indexes for a and b. For example:

JavaScript

Prints on my machine:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement