I’m trying to find all non-overlap bit sequences satisfied the following constraints:
- starts with
0
- ends with
110
or111
- does not contain
000
Here is my non-regex code in python:
def f(s): ans = [] match = '' for i in s: if match=='': if i=='0': match = '0' continue match += i if match.endswith('110') or match.endswith('111'): ans.append(match) match = '' if match.endswith('000'): match = '00' return ans
My current regex pattern is (0(^(?!000))*(?:110|111))
. It doesn’t accept any bit between starting 0
and ending bits. For example, 010110
will return 0110
instead of expected 010110
. Adding d*?
, however, accepts sequences including 000
.
>>> s = 1000001110001011001101011000010101100001001100001111 >>> f(s) # expected ['00111', '0010110', '0110', '0110', '001010110', '00100110', '00111'] >>> pat1 = re.compile('(0(^(?!000))*(?:110|111))') >>> pat2 = re.compile('(0d*?(^(?!000)d+)*d*?(?:110|111))') >>> re.findall(pat1, s) [('0111', ''), ('0110', ''), ('0110', ''), ('0110', ''), ('0110', ''), ('0110', ''), ('0111', '')] >>> re.findall(pat2, s) [('00000111', ''), ('00010110', ''), ('0110', ''), ('0110', ''), ('0001010110', ''), ('000100110', ''), ('000111', '')]
Any help would be appreciated.
Advertisement
Answer
You could start the pattern by matching a zero, and assert not 2 zeroes directly to the right.
Then match until the first occurrence of either 111 or 110 with no occurrence of 000 in between.
0(?!00)(?:(?!000)[10])*?11[01]
0
Match a single 0(?!00)
Assert not00
directly to the right(?:
Non capture group(?!000)[10]
Assert not000
directly to the right and match either0
or1
)*?
Optionally repeat the non capture group, non greedy11[01]
Match either 110 or 111
Another shorter variation could be:
00?(?:1+0{0,2})*?11[01]
00?
Match 1 or 2 zeroes(?:
Non capture group1+0{0,2}
Match 1+ times a1
followed by 0, 1 or 2 times a zero
)*?
Optionally repeat the non capture group, non greedy11[01]
Match either 110 or 111