Skip to content
Advertisement

Python regex iteration for all combinations

I am new to regex. I am using Python 2.7 and BeautifulSoup4. I want to iterate over a particular regular expression.

Required ouput :

length : 5 , expression : [a-zA-Z0-9!&#%@]

It should try all possible combinations e.g:
[‘aaaaa’,’aaaab’,’aaaac’,…,’aaaaz’,’aaaaA’,…,’aaaaZ’,’aaaa0′,’aaaa9′,’aaaa!’,’AAA!!’]

Moreover this should be possible too. If the expression is oranged{1}

[‘orangea’,’oranges’]]

I tried this:

 regexInput = "a-z0-9"
 #regexInput = "a-zA-Z0-9!@#$%^&"
 comb = itertools.permutations(regexInput,passLength)
 for x in comb:
    ''.join(x)

I realized that this is a totally wrong approach as these are just permutations. Please help. Sorry for bad explaination, very frustrated.

Advertisement

Answer

Itertools functions for permutations or combinaisons takes a series of elements as first parameter. It cannot generate the serie for you (from a-z to abc...xyz). Fortunatly string offer some constants like ascii_letters that contain a-zA-Z.

If your goal is to interpret the regex and generate every cases, … It’s pretty hard and you should explain the why? before we go further.

If you just want to get combinaisons for alphabetical letters:

import string
from itertools import combinations_with_replacement

result = combinations_with_replacement(string.ascii_letters, 5)

#comb = [''.join(n) for n in result] # warning, heavy processing

print [''.join(result.next()) for _ in range(10)]
# > ['aaaaa', 'aaaab', 'aaaac', 'aaaad', 'aaaae', 'aaaaf', 'aaaag', 'aaaah', 'aaaai', 'aaaaj']

You can replace string.ascii_letters with any serie of characters.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement