Skip to content
Advertisement

Regex to split text based on a sentence pattern

I have a text that looks something like: ‘19:54:12 From X to Y: some text after 21:08:15 From A to B:another text

I want to split the text based on the 19:54:12 From X to Y: sentence pattern. Ideally the result would look something like this [‘19:54:12 From X to Y:’, ‘ some text after’, ‘21:08:15 From A to B:’, ‘another text’].

X and Y can be multiple words including symbols. Note that between the time string and the word ‘From’ there’s one space, but after that there are two spaces between elements.

I’m using Python. I’ve managed to split the text based on the time string: re.split('(d{2}:d{2}:d{2})+s', string) however I’d like it to take into account the following word structure including the colon at the end, and also keep those words together with the time in the output list.

Help much appreciated!

Advertisement

Answer

You can split using this regex, which matches the time string followed by From and all the characters up to the colon:

(d{2}:d{2}:d{2} From  [^:]*:)

In python:

s = '19:54:12 From  X  to  Y: some text after 21:08:15 From  A  to  B:another text'
re.split(r'(d{2}:d{2}:d{2} From  [^:]*:)', s)

Output:

[
 '',
 '19:54:12 From  X  to  Y:',
 ' some text after ',
 '21:08:15 From  A  to  B:',
 'another text'
]

Note there is an empty value in the array due to the split pattern occurring at the beginning of the string; you can remove that with a list comprehension e.g.

[s for s in re.split(r'(d{2}:d{2}:d{2} From  [^:]*:)', s) if s]
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement