I want to randomly select an element from each list in a Series of lists.
import pandas as pd import numpy as np l=[['a','b','c'],['d','e','f'],['g','h','i'],['j','k','l'],['m','n','o']] s = pd.Series(l)
So s
is:
0 [a, b, c] 1 [d, e, f] 2 [g, h, i] 3 [j, k, l] 4 [m, n, o] dtype: object
I know I can do the following:
s = pd.Series([np.random.choice(i) for i in s])
Which does work:
0 a 1 e 2 h 3 j 4 m dtype: object
But I am wondering if there is a non-loop approach to do this?
For instance, (assuming each list
is equal size) you could make an array of random indices to try and pick a different element from each list
:
i = np.random.randint(3, size=len(l)) #array([2, 2, 0, 1, 0])
But doing say s[i]
doesn’t work because that is indexing s
rather than applying to each list
:
2 [g, h, i] 2 [g, h, i] 0 [a, b, c] 1 [d, e, f] 0 [a, b, c] dtype: object
My motivation is to have something that would work on a large amount of lists, hence the avoidance of a loop. But if my list comprehension seems like the most reasonable, or there is no builtin pandas
/numpy
function for this, please tell me.
Advertisement
Answer
I can only think of this way , however, the performance may be the problem
np.array(s.tolist())[np.arange(len(s)), np.random.randint(3, size=len(s))] array(['c', 'e', 'i', 'k', 'n'], dtype='<U1')
Some timing
%timeit s.explode().sample(frac=1, random_state=1) 5.05 ms ± 294 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit pd.Series([np.random.choice(i) for i in s]) 23.1 ms ± 184 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) %timeit np.array(s.tolist())[np.arange(len(s)), np.random.randint(3, size=len(s))] 1.63 ms ± 50.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)