This may not useful. It’s just a challenge I have set up for myself.
Let’s say you have a big array. What can you do so that the program does not benefit from caching, cache line prefetching or the fact that the next memory access can only be determined after the first access finishes.
So we have our array:
array = [0] * 10000000
What would be the best way to deoptimize the memory access if you had to access all elements in a loop? The idea is to increase the access time of each memory location as much as possible
I’m not looking for a solution which proposes to do “something else” (which takes time) before doing the next access. The idea is really to increase the access time as much as possible. I guess we have to traverse the array in a certain way (perhaps randomly? I’m still looking into it)
Advertisement
Answer
I did not expect any difference, but in fact accessing the digits in random order is significantly slower than accessing them in order or in reverse order (which is both about the same).
>>> N = 10**5 >>> arr = [random.randint(0, 1000) for _ in range(N)] >>> srt = list(range(N)) >>> rvd = srt[::-1] >>> rnd = random.sample(srt, N) >>> %timeit sum(arr[i] for i in srt) 10 loops, best of 5: 24.9 ms per loop >>> %timeit sum(arr[i] for i in rvd) 10 loops, best of 5: 25.7 ms per loop >>> %timeit sum(arr[i] for i in rnd) 10 loops, best of 5: 59.2 ms per loop
And it really seems to be the randomness. Just accessing indices out of order, but with a pattern, e.g. as [0, N-1, 2, N-3, ...]
or [0, N/2, 1, N/2+1, ...]
, is just as fast as accessing them in order:
>>> alt1 = [i if i % 2 == 0 else N - i for i in range(N)] >>> alt2 = [i for p in zip(srt[:N//2], srt[N//2:]) for i in p] >>> %timeit sum(arr[i] for i in alt1) 10 loops, best of 5: 24.5 ms per loop >>> %timeit sum(arr[i] for i in alt2) 10 loops, best of 5: 24.1 ms per loop
Interestingly, just iterating the shuffled indices (and calculating their sum
as with the array above) is also slower than doing the same with the sorted indices, but not as much. Of the ~35ms difference between srt
and rnd
, ~10ms seem to come from iterating the randomized indices, and ~25ms for actually accessing the indices in random order.
>>> %timeit sum(i for i in srt) 100 loops, best of 5: 19.7 ms per loop >>> %timeit sum(i for i in rnd) 10 loops, best of 5: 30.5 ms per loop >>> %timeit sum(arr[i] for i in srt) 10 loops, best of 5: 24.5 ms per loop >>> %timeit sum(arr[i] for i in rnd) 10 loops, best of 5: 56 ms per loop
(IPython 5.8.0 / Python 3.7.3 on a rather old laptop running Linux)