I’m having issues implementing the block frequency test in Python to understand the randomness of a binary string. I was wondering if anyone would be able to help me out in understanding why the code wont run.
Also, are there any statistical tests to test the randomness of a binary string in Python or possibly Matlab?
JavaScript
x
38
38
1
from importlib import import_module
2
import_module
3
from tokenize import Special
4
import math
5
def block_frequency(self, bin_data: str, block_size=4):
6
"""
7
Note that this description is taken from the NIST documentation [1]
8
[1] http://csrc.nist.gov/publications/nistpubs/800-22-rev1a/SP800-22rev1a.pdf
9
The focus of this tests is the proportion of ones within M-bit blocks. The purpose of this tests is to determine
10
whether the frequency of ones in an M-bit block is approximately M/2, as would be expected under an assumption
11
of randomness. For block size M=1, this test degenerates to the monobit frequency test.
12
:param bin_data: a binary string
13
:return: the p-value from the test
14
:param block_size: the size of the blocks that the binary sequence is partitioned into
15
"""
16
# Work out the number of blocks, discard the remainder
17
(num_blocks)= math.floor((1010110001001011011010111110010000000011010110111000001101) /4)
18
block_start, block_end = 0, 4
19
# Keep track of the proportion of ones per block
20
proportion_sum = 0.0
21
for i in range(num_blocks):
22
# Slice the binary string into a block
23
block_data = (101010001001011011010111110010000000011010110111000001101)[block_start:block_end]
24
# Keep track of the number of ones
25
ones_count = 0
26
for char in block_data:
27
if char == '1':
28
ones_count += 1
29
pi = ones_count / 4
30
proportion_sum += pow(pi - 0.5, 2.0)
31
# Update the slice locations
32
block_start += 4
33
block_end += 4
34
# Calculate the p-value
35
chi_squared = 4.0 * 4 * proportion_sum
36
p_val = Special.gammaincc(num_blocks / 2, chi_squared / 2)
37
print(p_val)
38
Advertisement
Answer
There are three issues that I see with your code.
- Using a hardcoded value in two different places. This is bad practice and error prone. I know this probably isn’t what the OP was referring to, but it’s worth fixing while we’re at it.
- A string of binary bits (especially one comparing to “1” further down) should be encapsulated in quotation marks, not parentheses. That’s one of the errors being thrown, ’cause the way it’s written now you’ve got a large integer which your trying to “index”. (This goes along with using
len
where necessary and some other minor changes). - You’re using the wrong module…You probably mean to use
scipy.special.gammainc
and nottokenize.Special.gammaincc
, which doesn’t exist anyhow.
Putting it all together, try something like:
JavaScript
1
43
43
1
from importlib import import_module
2
from scipy.special import gammainc
3
import_module
4
import math
5
6
7
def block_frequency(self, bin_data: str, block_size=4):
8
"""
9
Note that this description is taken from the NIST documentation [1]
10
[1] http://csrc.nist.gov/publications/nistpubs/800-22-rev1a/SP800-22rev1a.pdf
11
The focus of this tests is the proportion of ones within M-bit blocks. The purpose of this tests is to determine
12
whether the frequency of ones in an M-bit block is approximately M/2, as would be expected under an assumption
13
of randomness. For block size M=1, this test degenerates to the monobit frequency test.
14
:param bin_data: a binary string
15
:return: the p-value from the test
16
:param block_size: the size of the blocks that the binary sequence is partitioned into
17
"""
18
19
20
# Work out the number of blocks, discard the remainder
21
my_binary_string = '101010001001011011010111110010000000011010110111000001101'
22
num_blocks = math.floor(len(my_binary_string) / 4)
23
block_start, block_end = 0, 4
24
# Keep track of the proportion of ones per block
25
proportion_sum = 0.0
26
for i in range(num_blocks):
27
# Slice the binary string into a block
28
block_data = my_binary_string[block_start:block_end]
29
# Keep track of the number of ones
30
ones_count = 0
31
for char in block_data:
32
if char == '1':
33
ones_count += 1
34
pi = ones_count / 4
35
proportion_sum += pow(pi - 0.5, 2.0)
36
# Update the slice locations
37
block_start += 4
38
block_end += 4
39
# Calculate the p-value
40
chi_squared = 4.0 * 4 * proportion_sum
41
p_val = gammainc(num_blocks / 2, chi_squared / 2)
42
print(p_val)
43