I would like to split a string based on a delimiter and ignore a particular pattern. I have lines in a text file that look like so
JavaScript
x
4
1
"ABC | 0 | 567 | my name is | however
2
TQD | 0 | 567 | my name is | but
3
GED | 0 | 567 | my name is | haha"""
4
I would like to split on “|” but ignore 0 and 567 and grab the rest. i.e
JavaScript
1
4
1
['ABC', 'my name is', 'however']
2
['TQD', 'my name is', 'but']
3
['GED', 'my name is', 'haha']
4
whenever I split, its grabbing the two numbers as well. now numbers can occur in other places, but this particular pattern of |0|567| needs to be ignored. I can obviously split on “|” and pop the element at index 1 and 2. but looking for a better way.
I tried this:
JavaScript
1
4
1
import re
2
pattern = re.compile(r'|(?!0|567)')
3
pattern.split(line)
4
this yields [ABC|0|567, my name is, however]
Advertisement
Answer
To include the |
specific numbers |
in the split sequence:
JavaScript
1
2
1
pattern = re.compile(r' *|(?: *(?:0|567) *|)* *')
2
See this demo at regex101 or a Python demo at tio.run
The (?:
non capturing groups )
is repeated *
any amount of times.