I would like to split a string based on a delimiter and ignore a particular pattern. I have lines in a text file that look like so
"ABC | 0 | 567 | my name is | however TQD | 0 | 567 | my name is | but GED | 0 | 567 | my name is | haha"""
I would like to split on “|” but ignore 0 and 567 and grab the rest. i.e
['ABC', 'my name is', 'however'] ['TQD', 'my name is', 'but'] ['GED', 'my name is', 'haha']
whenever I split, its grabbing the two numbers as well. now numbers can occur in other places, but this particular pattern of |0|567| needs to be ignored. I can obviously split on “|” and pop the element at index 1 and 2. but looking for a better way.
I tried this:
import re pattern = re.compile(r'|(?!0|567)') pattern.split(line)
this yields [ABC|0|567, my name is, however]
Advertisement
Answer
To include the |
specific numbers |
in the split sequence:
pattern = re.compile(r' *|(?: *(?:0|567) *|)* *')
See this demo at regex101 or a Python demo at tio.run
The (?:
non capturing groups )
is repeated *
any amount of times.