Skip to content
Advertisement

Algorithm verification: Get all the combinaison of possible word

I wanted to know if the algorithm that i wrotte just below in python is correct.

My goal is to find an algorithm that print/find all the possible combinaison of words that can be done using the character from character ‘!’ (decimal value = 33) to character ‘~’ (decimal value = 126) in the asccii table:

ascii table

Here the code using recursion:

byteWord = bytearray(b'x20')  # Hex = 'x21' & Dec = '33' & Char = '!'

cntVerif = 0  # Test-------------------------------------------------------------------------------------------------------


def comb_fct(bytes_arr, cnt: int):
    global cntVerif  # Test------------------------------------------------------------------------------------------------

    if len(bytes_arr) > 3:  # Test-----------------------------------------------------------------------------------------
        print(f'{cntVerif+1}:TEST END')
        sys.exit()

    if bytes_arr[cnt] == 126:
        if cnt == len(bytes_arr) or len(bytes_arr) == 1:
            bytes_arr.insert(0, 32)
        bytes_arr[cnt] = 32
        cnt += 1
        cntVerif += 1  # Test----------------------------------------------------------------------------------------------
        print(f'{cntVerif}:if bytes_arr[cnt] == 126: ntbytes_arr = {bytes_arr}')  # Test-------------------------------------------------------------------------------------------
        comb_fct(bytes_arr, cnt)

    if cnt == -1 or cnt == len(bytes_arr)-1:
        bytes_arr[cnt] = bytes_arr[cnt] + 1
        cntVerif += 1  # Test----------------------------------------------------------------------------------------------
        print(f'{cntVerif}:if cnt==-1: ntbytes_arr = {bytes_arr}')  # Test-------------------------------------------------------------------------------------------
        comb_fct(bytes_arr, cnt=-1)  # index = -1 means last index

    bytes_arr[cnt] = bytes_arr[cnt] + 1
    cntVerif += 1  # Test--------------------------------------------------------------------------------------------------
    print(f'{cntVerif}:None if: ntbytes_arr={bytes_arr}')  # Test-----------------------------------------------------------------------------------------------
    comb_fct(bytes_arr, cnt+1)


comb_fct(byteWord, -1)

Thank your for your help because python allow just a limited number of recursion (996 on my computer) so i for exemple i can’t verify if my algorithm give all the word of length 3 that can be realised with the range of character describe upper.

Of course if anyone has a better idea to writte this algorithm (a faster algorithm for exemple). I will be happy to read it.

Advertisement

Answer

Although you might be able to tweak this a bit, I think the code below is close to the most efficient solution to your problem, which I take to be “generate all possible sequences of maximum length N from a given set of characters”. That might be a bit more general than you need, since your set of characters is fixed, but the general solution is more useful and little overhead is added.

Note that the function is written as a generator, using functions from the itertools standard library module. Itertools is described as a set of “functions creating iterators for efficient looping” (emphasis added), and it indeed is. Generators are one of Python’s great features, since they allow you to easily and efficiently iterate over complex sequences. If you want to write efficient and “pythonic” code, you should familiarise yourself with these concepts (as well as other essential features, such as comprehensions). So I’m not going to explain these features further; please read the tutorial sections for details.

So here’s the simple solution:

from itertools import product, chain

def genseq(maxlen, chars):
    return map(''.join,
               chain.from_iterable(product(chars, repeat=i)
                                   for i in range(maxlen+1)))

# Example usage:
chars = ''.join(chr(i) for i in range(33, 127))
for word in genseq(4, chars):
    # Do something with word

There are 78,914,411 possible words (including the empty word); the above generates all of them in 7 seconds on my laptop. Much of that time is spent creating (and garbage collecting) those strings; you might well be able to do better using a bytearray and recycling it for each generated word. I didn’t try that.

For the record, here’s a simpler way of “unindexing” an enumeration of such strings. The enumeration starts with the empty word, followed by all 1-character words, then 2-character words, and so on. This ordering makes it unnecessary to specify the length (or even maximum length) of the resulting string.

def unindex(i, chars):
    v = []
    n = len(chars)
    while i > 0:
        i -= 1
        v.append(i % n)
        i //= n
    return ''.join(chars[j] for j in v[::-1])

# Example to generate the same words as above:
# chars as above
index_limit = (len(chars) ** 5 - 1) // (len(chars) - 1)
for i in range(0, index_limit):
    word = unindex(i, chars)
    # Do something with word

Again, you can probably speed this up a bit by using a recycled bytearray. As written above, it took about two minutes, sixteen times as long as my first version.

Note that using bytearrays in the way you do in your answer does not significantly speed things up, because it creates a new bytearray each time. In order to achieve the savings, you have to use a single bytearray for the entire generations, modifying it rather than recreating it. That’s more awkward in practice, because it means that if you need to keep a generated word around for later, perhaps because it passed some test, you must copy it. It’s easy to forget that, and the resulting bug can be very hard to track down.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement