I’m trying to find all non-overlap bit sequences satisfied the following constraints:
- starts with
0
- ends with
110
or111
- does not contain
000
Here is my non-regex code in python:
JavaScript
x
16
16
1
def f(s):
2
ans = []
3
match = ''
4
for i in s:
5
if match=='':
6
if i=='0':
7
match = '0'
8
continue
9
match += i
10
if match.endswith('110') or match.endswith('111'):
11
ans.append(match)
12
match = ''
13
if match.endswith('000'):
14
match = '00'
15
return ans
16
My current regex pattern is (0(^(?!000))*(?:110|111))
. It doesn’t accept any bit between starting 0
and ending bits. For example, 010110
will return 0110
instead of expected 010110
. Adding d*?
, however, accepts sequences including 000
.
JavaScript
1
10
10
1
>>> s = 1000001110001011001101011000010101100001001100001111
2
>>> f(s) # expected
3
['00111', '0010110', '0110', '0110', '001010110', '00100110', '00111']
4
>>> pat1 = re.compile('(0(^(?!000))*(?:110|111))')
5
>>> pat2 = re.compile('(0d*?(^(?!000)d+)*d*?(?:110|111))')
6
>>> re.findall(pat1, s)
7
[('0111', ''), ('0110', ''), ('0110', ''), ('0110', ''), ('0110', ''), ('0110', ''), ('0111', '')]
8
>>> re.findall(pat2, s)
9
[('00000111', ''), ('00010110', ''), ('0110', ''), ('0110', ''), ('0001010110', ''), ('000100110', ''), ('000111', '')]
10
Any help would be appreciated.
Advertisement
Answer
You could start the pattern by matching a zero, and assert not 2 zeroes directly to the right.
Then match until the first occurrence of either 111 or 110 with no occurrence of 000 in between.
JavaScript
1
2
1
0(?!00)(?:(?!000)[10])*?11[01]
2
0
Match a single 0(?!00)
Assert not00
directly to the right(?:
Non capture group(?!000)[10]
Assert not000
directly to the right and match either0
or1
)*?
Optionally repeat the non capture group, non greedy11[01]
Match either 110 or 111
Another shorter variation could be:
JavaScript
1
2
1
00?(?:1+0{0,2})*?11[01]
2
00?
Match 1 or 2 zeroes(?:
Non capture group1+0{0,2}
Match 1+ times a1
followed by 0, 1 or 2 times a zero
)*?
Optionally repeat the non capture group, non greedy11[01]
Match either 110 or 111