I want to sample from a list until all elements have appeared at least once. We can use tossing a die as an example. A die has six sides: 1 through 6. I keep tossing it until I see all six values at least once, then I stop. Here is my function.
import numpy as np
def sample_until_all(all_values):
sampled_vals = np.empty(0)
while True:
cur_val = np.random.choice(all_values, 1)
sampled_vals = np.append(sampled_vals, cur_val[0])
if set(all_values) == set(sampled_vals):
return(len(sampled_vals))
sample_until_all(range(6))
cur_val
is the value from the current toss. I keep all sampled values in sampled_vals
using np.append
, and I check if it contains all possible values after each toss using set(all_values) == set(sampled_vals)
. It works but not efficiently (I believe). Any ideas how to make it faster? Thanks.
I just use this as a toy example. The actual list I need is much larger than just 6 values.
Advertisement
Answer
I don’t think numpy is really useful here as you generate your values one by one.
What about using a Counter?
from collections import Counter
import random
def sample_until_all(all_values):
all_values = list(all_values)
c = Counter()
while len(c)<len(all_values):
c[random.choice(all_values)] +=1
return sum(c.values())
sample_until_all(range(6))
This code is about 50 times faster for range(6)
, and the difference is even greater when the range is larger