Is there a faster version of numpy.random.shuffle?

Question

I&#8217;m using numpy.random.shuffle in order to compute a statistic on randomized columns of a 2D array. The Python code is as follows: The speed I get is something like this: 1 loops, best of 3: 391 ms per loop I tried to Cythonize this function but I wasn&#8217;t sure how to replace the call to np.random.s…

Accepted Answer

It&#8217;s likely that this will give a nice speed boost:from timeit import Timerimport numpy as nparr = np.random.sample((50, 5000))def timeline_sample(series, num):    random = series.copy()    for i in range(num):        np.random.shuffle(random.T)        yield randomdef timeline_sample_fast(series, num):    random = series.T.copy()    for i in range(num):        np.random.shuffle(random)        yield random.Tdef timeline_sample_faster(series, num):    length = arr.shape[1]    for i in range(num):        yield series[:, np.random.permutation(length)]def consume(iterable):    for s in iterable:        np.sum(s)min(Timer(lambda: consume(timeline_sample(arr, 1))).repeat(10, 10))min(Timer(lambda: consume(timeline_sample_fast(arr, 1))).repeat(10, 10))min(Timer(lambda: consume(timeline_sample_faster(arr, 1))).repeat(10, 10))#>>> 0.2585161680035526#>>> 0.2416607110062614#>>> 0.04835709399776533Forcing it to be contiguous does increase the time, but not by a ton:def consume(iterable):    for s in iterable:        np.sum(np.ascontiguousarray(s))min(Timer(lambda: consume(timeline_sample(arr, 1))).repeat(10, 10))min(Timer(lambda: consume(timeline_sample_fast(arr, 1))).repeat(10, 10))min(Timer(lambda: consume(timeline_sample_faster(arr, 1))).repeat(10, 10))#>>> 0.2632228760048747#>>> 0.25778737501241267#>>> 0.07451769898761995

Advertisement

Answer