Python: simulate tossing a die until all values have appeared at least once

I want to sample from a list until all elements have appeared at least once. We can use tossing a die as an example. A die has six sides: 1 through 6. I keep tossing it until I see all six values at least once, then I stop. Here is my function.

import numpy as np

def sample_until_all(all_values):
    sampled_vals = np.empty(0)
    while True:
        cur_val = np.random.choice(all_values, 1)
        sampled_vals = np.append(sampled_vals, cur_val[0])        
        if set(all_values) == set(sampled_vals):
            return(len(sampled_vals))
        
sample_until_all(range(6))

JavaScript
​x
 
import numpy as np
​
def sample_until_all(all_values):
    sampled_vals = np.empty(0)
    while True:
        cur_val = np.random.choice(all_values, 1)
        sampled_vals = np.append(sampled_vals, cur_val[0])        
        if set(all_values) == set(sampled_vals):
            return(len(sampled_vals))
        
sample_until_all(range(6))
​

cur_val is the value from the current toss. I keep all sampled values in sampled_vals using np.append, and I check if it contains all possible values after each toss using set(all_values) == set(sampled_vals). It works but not efficiently (I believe). Any ideas how to make it faster? Thanks.

I just use this as a toy example. The actual list I need is much larger than just 6 values.

Answer

I don’t think numpy is really useful here as you generate your values one by one.

What about using a Counter?

from collections import Counter
import random

def sample_until_all(all_values):
    all_values = list(all_values)
    c = Counter()
    while len(c)<len(all_values):
        c[random.choice(all_values)] +=1
    return sum(c.values())


sample_until_all(range(6))

JavaScript
 
from collections import Counter
import random
​
def sample_until_all(all_values):
    all_values = list(all_values)
    c = Counter()
    while len(c)<len(all_values):
        c[random.choice(all_values)] +=1
    return sum(c.values())
​
​
sample_until_all(range(6))
​

This code is about 50 times faster for range(6), and the difference is even greater when the range is larger

Advertisement

Answer