I’m trying to find all non-overlap bit sequences satisfied the following constraints:
- starts with
0 - ends with
110or111 - does not contain
000
Here is my non-regex code in python:
def f(s):
ans = []
match = ''
for i in s:
if match=='':
if i=='0':
match = '0'
continue
match += i
if match.endswith('110') or match.endswith('111'):
ans.append(match)
match = ''
if match.endswith('000'):
match = '00'
return ans
My current regex pattern is (0(^(?!000))*(?:110|111)). It doesn’t accept any bit between starting 0 and ending bits. For example, 010110 will return 0110 instead of expected 010110. Adding d*?, however, accepts sequences including 000.
>>> s = 1000001110001011001101011000010101100001001100001111
>>> f(s) # expected
['00111', '0010110', '0110', '0110', '001010110', '00100110', '00111']
>>> pat1 = re.compile('(0(^(?!000))*(?:110|111))')
>>> pat2 = re.compile('(0d*?(^(?!000)d+)*d*?(?:110|111))')
>>> re.findall(pat1, s)
[('0111', ''), ('0110', ''), ('0110', ''), ('0110', ''), ('0110', ''), ('0110', ''), ('0111', '')]
>>> re.findall(pat2, s)
[('00000111', ''), ('00010110', ''), ('0110', ''), ('0110', ''), ('0001010110', ''), ('000100110', ''), ('000111', '')]
Any help would be appreciated.
Advertisement
Answer
You could start the pattern by matching a zero, and assert not 2 zeroes directly to the right.
Then match until the first occurrence of either 111 or 110 with no occurrence of 000 in between.
0(?!00)(?:(?!000)[10])*?11[01]
0Match a single 0(?!00)Assert not00directly to the right(?:Non capture group(?!000)[10]Assert not000directly to the right and match either0or1
)*?Optionally repeat the non capture group, non greedy11[01]Match either 110 or 111
Another shorter variation could be:
00?(?:1+0{0,2})*?11[01]
00?Match 1 or 2 zeroes(?:Non capture group1+0{0,2}Match 1+ times a1followed by 0, 1 or 2 times a zero
)*?Optionally repeat the non capture group, non greedy11[01]Match either 110 or 111