Skip to content
Advertisement

how to define selection condition in regex in python

I am having a string in which some binary numbers are mentioned. I want to count number of occurrence of given pattern, but I want set my pattern above 7 digits of character, so the result should show only more than 7 characters. it means how I can set my pattern selection, so it should count only 7 digits and above (pattern = r”(0+1+0+1+0+1+)” {<7} ). please note: it should select 7 digits and above, not just fixed 7 digits.

import re
from collections import Counter



pattern = r"0+1+0+1+0+1+"

test_str = '01010100110011001100011100011110000101101110100001101011000111011001010011001001001101000011' 
           '00110011001100110011010101001100110001111110010100100111010110001001100010011010110011'

cnt = Counter(re.findall(pattern, test_str))
print(cnt.most_common())

# result [('010101', 2), ('00110001110001111', 1), ('011101000011', 1), ('0111011001', 1), ('010010011', 1), ('001100110011', 1), ('00011111100101', 1), ('010110001', 1)]

result should be show only more than 7 character it no supposed to show (‘010101’, 2)

Advertisement

Answer

The simplest solution is to filter the list of regex matches.

import re
from collections import Counter

pattern = r"0+1+0+1+0+1+"

test_str = '01010100110011001100011100011110000101101110100001101011000111011001010011001001001101000011' 
           '00110011001100110011010101001100110001111110010100100111010110001001100010011010110011'

cnt = Counter([p for p in re.findall(pattern, test_str) if len(p) > 6])
print(cnt.most_common())

Output:

[('001100110011', 2), ('000111000111100001', 1), ('011011101', 1), ('00001101011', 1), ('000111011001', 1), ('010011001', 1), ('001001101', 1), ('00001100110011', 1), ('00110011000111111', 1), ('00101001', 1), ('0011101011', 1), ('000100110001', 1), ('001101011', 1)]
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement