I’m writing optimised Floyd-Warshall algorithm on GPU using numba. I need it to work in a few seconds in case of 10k matricies. Right now the processing is done in around 60s. Here is my code: To be honest I’m pretty new to writing scripts on GPU, so do you have any ideas how to make this code even faster?