Skip to content
Advertisement

Find bit sequences that do not include specific sequence

I’m trying to find all non-overlap bit sequences satisfied the following constraints:

  • starts with 0
  • ends with 110 or 111
  • does not contain 000

Here is my non-regex code in python:

def f(s):
    ans = []
    match = ''
    for i in s:
        if match=='':
            if i=='0':
                match = '0'
            continue
        match += i
        if match.endswith('110') or match.endswith('111'):
            ans.append(match)
            match = ''
        if match.endswith('000'):
            match = '00'
    return ans

My current regex pattern is (0(^(?!000))*(?:110|111)). It doesn’t accept any bit between starting 0 and ending bits. For example, 010110 will return 0110 instead of expected 010110. Adding d*?, however, accepts sequences including 000.

>>> s = 1000001110001011001101011000010101100001001100001111
>>> f(s) # expected
['00111', '0010110', '0110', '0110', '001010110', '00100110', '00111']
>>> pat1 = re.compile('(0(^(?!000))*(?:110|111))')
>>> pat2 = re.compile('(0d*?(^(?!000)d+)*d*?(?:110|111))')
>>> re.findall(pat1, s)
[('0111', ''), ('0110', ''), ('0110', ''), ('0110', ''), ('0110', ''), ('0110', ''), ('0111', '')]
>>> re.findall(pat2, s)
[('00000111', ''), ('00010110', ''), ('0110', ''), ('0110', ''), ('0001010110', ''), ('000100110', ''), ('000111', '')]

Any help would be appreciated.

Advertisement

Answer

You could start the pattern by matching a zero, and assert not 2 zeroes directly to the right.

Then match until the first occurrence of either 111 or 110 with no occurrence of 000 in between.

0(?!00)(?:(?!000)[10])*?11[01]
  • 0 Match a single 0
  • (?!00) Assert not 00 directly to the right
  • (?: Non capture group
    • (?!000)[10] Assert not 000 directly to the right and match either 0 or 1
  • )*? Optionally repeat the non capture group, non greedy
  • 11[01] Match either 110 or 111

Regex demo

Another shorter variation could be:

00?(?:1+0{0,2})*?11[01]
  • 00? Match 1 or 2 zeroes
  • (?: Non capture group
    • 1+0{0,2} Match 1+ times a 1 followed by 0, 1 or 2 times a zero
  • )*? Optionally repeat the non capture group, non greedy
  • 11[01] Match either 110 or 111

Regex demo

Advertisement