Skip to content
Advertisement

split on delimeter and ignore a pattern

I would like to split a string based on a delimiter and ignore a particular pattern. I have lines in a text file that look like so

 "ABC | 0 | 567 | my name is | however
  TQD | 0 | 567 | my name is | but
  GED | 0 | 567 | my name is | haha"""

I would like to split on “|” but ignore 0 and 567 and grab the rest. i.e

['ABC', 'my name is', 'however']
['TQD', 'my name is', 'but']
['GED', 'my name is', 'haha']

whenever I split, its grabbing the two numbers as well. now numbers can occur in other places, but this particular pattern of |0|567| needs to be ignored. I can obviously split on “|” and pop the element at index 1 and 2. but looking for a better way.

I tried this:

import re
pattern = re.compile(r'|(?!0|567)')
pattern.split(line) 

this yields [ABC|0|567, my name is, however]

Advertisement

Answer

To include the | specific numbers | in the split sequence:

pattern = re.compile(r' *|(?: *(?:0|567) *|)* *')

See this demo at regex101 or a Python demo at tio.run


The (?: non capturing groups ) is repeated * any amount of times.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement