I’m having issues implementing the block frequency test in Python to understand the randomness of a binary string. I was wondering if anyone would be able to help me out in understanding why the code wont run.
Also, are there any statistical tests to test the randomness of a binary string in Python or possibly Matlab?
from importlib import import_module import_module from tokenize import Special import math def block_frequency(self, bin_data: str, block_size=4): """ Note that this description is taken from the NIST documentation [1] [1] http://csrc.nist.gov/publications/nistpubs/800-22-rev1a/SP800-22rev1a.pdf The focus of this tests is the proportion of ones within M-bit blocks. The purpose of this tests is to determine whether the frequency of ones in an M-bit block is approximately M/2, as would be expected under an assumption of randomness. For block size M=1, this test degenerates to the monobit frequency test. :param bin_data: a binary string :return: the p-value from the test :param block_size: the size of the blocks that the binary sequence is partitioned into """ # Work out the number of blocks, discard the remainder (num_blocks)= math.floor((1010110001001011011010111110010000000011010110111000001101) /4) block_start, block_end = 0, 4 # Keep track of the proportion of ones per block proportion_sum = 0.0 for i in range(num_blocks): # Slice the binary string into a block block_data = (101010001001011011010111110010000000011010110111000001101)[block_start:block_end] # Keep track of the number of ones ones_count = 0 for char in block_data: if char == '1': ones_count += 1 pi = ones_count / 4 proportion_sum += pow(pi - 0.5, 2.0) # Update the slice locations block_start += 4 block_end += 4 # Calculate the p-value chi_squared = 4.0 * 4 * proportion_sum p_val = Special.gammaincc(num_blocks / 2, chi_squared / 2) print(p_val)
Advertisement
Answer
There are three issues that I see with your code.
- Using a hardcoded value in two different places. This is bad practice and error prone. I know this probably isn’t what the OP was referring to, but it’s worth fixing while we’re at it.
- A string of binary bits (especially one comparing to “1” further down) should be encapsulated in quotation marks, not parentheses. That’s one of the errors being thrown, ’cause the way it’s written now you’ve got a large integer which your trying to “index”. (This goes along with using
len
where necessary and some other minor changes). - You’re using the wrong module…You probably mean to use
scipy.special.gammainc
and nottokenize.Special.gammaincc
, which doesn’t exist anyhow.
Putting it all together, try something like:
from importlib import import_module from scipy.special import gammainc import_module import math def block_frequency(self, bin_data: str, block_size=4): """ Note that this description is taken from the NIST documentation [1] [1] http://csrc.nist.gov/publications/nistpubs/800-22-rev1a/SP800-22rev1a.pdf The focus of this tests is the proportion of ones within M-bit blocks. The purpose of this tests is to determine whether the frequency of ones in an M-bit block is approximately M/2, as would be expected under an assumption of randomness. For block size M=1, this test degenerates to the monobit frequency test. :param bin_data: a binary string :return: the p-value from the test :param block_size: the size of the blocks that the binary sequence is partitioned into """ # Work out the number of blocks, discard the remainder my_binary_string = '101010001001011011010111110010000000011010110111000001101' num_blocks = math.floor(len(my_binary_string) / 4) block_start, block_end = 0, 4 # Keep track of the proportion of ones per block proportion_sum = 0.0 for i in range(num_blocks): # Slice the binary string into a block block_data = my_binary_string[block_start:block_end] # Keep track of the number of ones ones_count = 0 for char in block_data: if char == '1': ones_count += 1 pi = ones_count / 4 proportion_sum += pow(pi - 0.5, 2.0) # Update the slice locations block_start += 4 block_end += 4 # Calculate the p-value chi_squared = 4.0 * 4 * proportion_sum p_val = gammainc(num_blocks / 2, chi_squared / 2) print(p_val)