##Mock Data## my_list = list(range(1700)) import itertools cross_product = list(itertools.product(my_list,my_list)) station_combinations = ["_".join([str(i),str(b)]) for i,b in cross_product if i != b] ############### from time import time,sleep station_name = "5" start = time() for h in range(10): reverse_indexes = [count for count,j in enumerate(station_combinations) if j.split("_")[1] == station_name ] regular_indexes = [count for count,j in enumerate(station_combinations) if j.split("_")[0] == station_name ] print(time() - start )
Hello, I have shared the reproducible code above.
Background: Let me quickly introduce my data station_combinations is the cross product of “my_list” separated by the notation “_”. You can think of it as a destination among “my_list” items so 1_2 would be going from 1 to 2 whereas 2_1 would going from 2 to 1.
So I will refer as “a_b” Among all the combinations in “reverse_indexes”, I am trying to find the index of elements where b in ( “a_b” ) is equal to “station_name”, so the “destination” is equal to station name, and in the regular_indexes an in (“a_b”) the source is equal to the station_name
Problem: The code that I have works however it is very slow. if you look at the for loop (with cursor h) I iterate 10 times, however, in the original code, it is supposed to be approx. 2000. With even 10 iterations it approx. takes 8seconds on my computer. I am looking for ways to improve the speed significantly. I have tried the library numba, however because I actually get some of the data from a data frame I wasn’t able to work it out with the “@njit” functionality. Would anyone be able to help?
Advertisement
Answer
One solution can be using indexes, in this case two indexes for a
and b
. For example:
my_list = list(range(1700)) import itertools cross_product = list(itertools.product(my_list, my_list)) station_combinations = [ "_".join([str(i), str(b)]) for i, b in cross_product if i != b ] # precompute indexes: index_a = {} index_b = {} for i, s in enumerate(station_combinations): a, b = s.split("_") index_a.setdefault(a, []).append(i) index_b.setdefault(b, []).append(i) from time import time, sleep station_name = "5" start = time() for h in range(10): reverse_indexes_new = index_b.get(station_name, []) regular_indexes_new = index_a.get(station_name, []) print(time() - start)
Prints on my machine:
7.62939453125e-06