I want to randomly select an element from each list in a Series of lists. So s is: I know I can do the following: Which does work: But I am wondering if there is a non-loop approach to do this? For instance, (assuming each list is equal size) you could make an array of random indices to try and

Element-wise random choice of a Series of lists (without a loop)

I want to randomly select an element from each list in a Series of lists.

import pandas as pd
import numpy as np

l=[['a','b','c'],['d','e','f'],['g','h','i'],['j','k','l'],['m','n','o']]
s = pd.Series(l)

JavaScript
​x
 
import pandas as pd
import numpy as np
​
l=[['a','b','c'],['d','e','f'],['g','h','i'],['j','k','l'],['m','n','o']]
s = pd.Series(l)
​

So s is:

0    [a, b, c]
1    [d, e, f]
2    [g, h, i]
3    [j, k, l]
4    [m, n, o]
dtype: object

JavaScript
 
0    [a, b, c]
1    [d, e, f]
2    [g, h, i]
3    [j, k, l]
4    [m, n, o]
dtype: object
​

I know I can do the following:

s = pd.Series([np.random.choice(i) for i in s])

JavaScript
 
s = pd.Series([np.random.choice(i) for i in s])
​

Which does work:

0    a
1    e
2    h
3    j
4    m
dtype: object

JavaScript
 
0    a
1    e
2    h
3    j
4    m
dtype: object
​

But I am wondering if there is a non-loop approach to do this?

For instance, (assuming each list is equal size) you could make an array of random indices to try and pick a different element from each list:

i = np.random.randint(3, size=len(l))
#array([2, 2, 0, 1, 0])

JavaScript
 
i = np.random.randint(3, size=len(l))
#array([2, 2, 0, 1, 0])
​

But doing say s[i] doesn’t work because that is indexing s rather than applying to each list:

2    [g, h, i]
2    [g, h, i]
0    [a, b, c]
1    [d, e, f]
0    [a, b, c]
dtype: object

JavaScript
 
2    [g, h, i]
2    [g, h, i]
0    [a, b, c]
1    [d, e, f]
0    [a, b, c]
dtype: object
​

My motivation is to have something that would work on a large amount of lists, hence the avoidance of a loop. But if my list comprehension seems like the most reasonable, or there is no builtin pandas/numpy function for this, please tell me.

Answer

I can only think of this way , however, the performance may be the problem

np.array(s.tolist())[np.arange(len(s)), np.random.randint(3, size=len(s))]
array(['c', 'e', 'i', 'k', 'n'], dtype='<U1')

JavaScript
 
np.array(s.tolist())[np.arange(len(s)), np.random.randint(3, size=len(s))]
array(['c', 'e', 'i', 'k', 'n'], dtype='<U1')
​

Some timing

%timeit s.explode().sample(frac=1, random_state=1) 
5.05 ms ± 294 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd.Series([np.random.choice(i) for i in s])
23.1 ms ± 184 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.array(s.tolist())[np.arange(len(s)), np.random.randint(3, size=len(s))]
1.63 ms ± 50.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

JavaScript
 
%timeit s.explode().sample(frac=1, random_state=1) 
5.05 ms ± 294 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit pd.Series([np.random.choice(i) for i in s])
23.1 ms ± 184 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit np.array(s.tolist())[np.arange(len(s)), np.random.randint(3, size=len(s))]
1.63 ms ± 50.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
​

Advertisement

Answer