Element-wise random choice of a Series of lists (without a loop)

Tags: , , ,

I want to randomly select an element from each list in a Series of lists.

import pandas as pd
import numpy as np

s = pd.Series(l)

So s is:

0    [a, b, c]
1    [d, e, f]
2    [g, h, i]
3    [j, k, l]
4    [m, n, o]
dtype: object

I know I can do the following:

s = pd.Series([np.random.choice(i) for i in s])

Which does work:

0    a
1    e
2    h
3    j
4    m
dtype: object

But I am wondering if there is a non-loop approach to do this?

For instance, (assuming each list is equal size) you could make an array of random indices to try and pick a different element from each list:

i = np.random.randint(3, size=len(l))
#array([2, 2, 0, 1, 0])

But doing say s[i] doesn’t work because that is indexing s rather than applying to each list:

2    [g, h, i]
2    [g, h, i]
0    [a, b, c]
1    [d, e, f]
0    [a, b, c]
dtype: object

My motivation is to have something that would work on a large amount of lists, hence the avoidance of a loop. But if my list comprehension seems like the most reasonable, or there is no builtin pandas/numpy function for this, please tell me.


I can only think of this way , however, the performance may be the problem

np.array(s.tolist())[np.arange(len(s)), np.random.randint(3, size=len(s))]
array(['c', 'e', 'i', 'k', 'n'], dtype='<U1')

Some timing

%timeit s.explode().sample(frac=1, random_state=1) 
5.05 ms ± 294 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd.Series([np.random.choice(i) for i in s])
23.1 ms ± 184 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.array(s.tolist())[np.arange(len(s)), np.random.randint(3, size=len(s))]
1.63 ms ± 50.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Source: stackoverflow