Skip to content
Advertisement

Is there a faster version of numpy.random.shuffle?

I’m using numpy.random.shuffle in order to compute a statistic on randomized columns of a 2D array. The Python code is as follows:

import numpy as np

def timeline_sample(series, num):
    random = series.copy()
    for i in range(num):
        np.random.shuffle(random.T)
        yield random

The speed I get is something like this:

import numpy as np
arr = np.random.sample((50, 5000))

%%timeit
for series in timeline_sample(rnd, 100):
    np.sum(series)
1 loops, best of 3: 391 ms per loop

I tried to Cythonize this function but I wasn’t sure how to replace the call to np.random.shuffle and the function was 3x slower. Does anyone know how to accelerate or replace this? It is currently the bottleneck in my program.

Cython code:

cimport cython

import numpy as np
cimport numpy as np


@cython.boundscheck(False)
@cython.wraparound(False)
def timeline_sample2(double[:, ::1] series, int num):
    cdef double[:, ::1] random = series.copy()
    cdef int i
    for i in range(num):
        np.random.shuffle(random.T)
        yield random

Advertisement

Answer

It’s likely that this will give a nice speed boost:

from timeit import Timer

import numpy as np
arr = np.random.sample((50, 5000))

def timeline_sample(series, num):
    random = series.copy()
    for i in range(num):
        np.random.shuffle(random.T)
        yield random

def timeline_sample_fast(series, num):
    random = series.T.copy()
    for i in range(num):
        np.random.shuffle(random)
        yield random.T

def timeline_sample_faster(series, num):
    length = arr.shape[1]
    for i in range(num):
        yield series[:, np.random.permutation(length)]

def consume(iterable):
    for s in iterable:
        np.sum(s)

min(Timer(lambda: consume(timeline_sample(arr, 1))).repeat(10, 10))
min(Timer(lambda: consume(timeline_sample_fast(arr, 1))).repeat(10, 10))
min(Timer(lambda: consume(timeline_sample_faster(arr, 1))).repeat(10, 10))
#>>> 0.2585161680035526
#>>> 0.2416607110062614
#>>> 0.04835709399776533

Forcing it to be contiguous does increase the time, but not by a ton:

def consume(iterable):
    for s in iterable:
        np.sum(np.ascontiguousarray(s))

min(Timer(lambda: consume(timeline_sample(arr, 1))).repeat(10, 10))
min(Timer(lambda: consume(timeline_sample_fast(arr, 1))).repeat(10, 10))
min(Timer(lambda: consume(timeline_sample_faster(arr, 1))).repeat(10, 10))
#>>> 0.2632228760048747
#>>> 0.25778737501241267
#>>> 0.07451769898761995
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement