I want to sample from a list until all elements have appeared at least once. We can use tossing a die as an example. A die has six sides: 1 through 6. I keep tossing it until I see all six values at least once, then I stop. Here is my function.
import numpy as np def sample_until_all(all_values): sampled_vals = np.empty(0) while True: cur_val = np.random.choice(all_values, 1) sampled_vals = np.append(sampled_vals, cur_val[0]) if set(all_values) == set(sampled_vals): return(len(sampled_vals)) sample_until_all(range(6))
cur_val
is the value from the current toss. I keep all sampled values in sampled_vals
using np.append
, and I check if it contains all possible values after each toss using set(all_values) == set(sampled_vals)
. It works but not efficiently (I believe). Any ideas how to make it faster? Thanks.
I just use this as a toy example. The actual list I need is much larger than just 6 values.
Advertisement
Answer
I don’t think numpy is really useful here as you generate your values one by one.
What about using a Counter?
from collections import Counter import random def sample_until_all(all_values): all_values = list(all_values) c = Counter() while len(c)<len(all_values): c[random.choice(all_values)] +=1 return sum(c.values()) sample_until_all(range(6))
This code is about 50 times faster for range(6)
, and the difference is even greater when the range is larger