Skip to content
Advertisement

Generate random binary matrix constrained to no null row

I want to generate a random binary matrix, so I’m using W=np.random.binomial(1, p, (n,n)). It works fine, but I want a constraint that no row is just of 0s.

I create the following function:

def random_matrix(p,n):
    m=0
    while m==0:
        W = np.random.binomial(1, p, (n,n))
        m=min(W.sum(axis=1))
    return W

It also works fine, but it seems to me too inefficient. Is there a faster way to create this constraint?

Advertisement

Answer

When the matrix is large, regenerating the entire matrix just because few rows are full of zeros is not efficient. It should be statistically safe to only regenerate the target rows. Here is an example:

def random_matrix(p,n):
    W = np.random.binomial(1, p, (n,n))

    while True:
        null_rows = np.where(W.sum(axis=1) == 0)[0]
        # If there is no null row, then m>0 so we stop the replacement
        if null_rows.size == 0:
            break
        # Replace only the null rows
        W[null_rows] = np.random.binomial(1, p, (null_rows.shape[0],n))

    return W

Even faster solutions

There is an even more efficient approach when p is close to 0 (when p is close to 1, then the above function is already fast). Indeed, a binomial random variable with 0-1 values is a Bernoulli random variable. The sum of Bernoulli random values with a probability p repeated many times is a binomial random value! Thus, you can generate the sum for all row using S = np.random.binomial(n, p, (n,n)), then apply the above method to remove null values and then build the final matrix by generating S[i] one values for the ith row and use np.shuffle so to randomize the order of the 0-1 values in each row. This method solve conflicts much more efficiently than all others. Indeed, it does not need to generate the full row to check if it is full of zeros. It is n times faster to solve conflicts!

If this is not enough, you can use the uint8 datatype to generate W. Indeed, the memory is slow so generating smaller matrices is generally faster, not to mention it takes less RAM.

If this is not enough, you can generate S item per item using Numba JIT compiler and a basic loop. This should be faster since there is no temporary array to create except the final one. For large matrices, this algorithm can even be parallelized (every row can be independently generated). This last solution should be close to be optimal.

Advertisement