Skip to content
Advertisement

Split a string by regex and keep the seperator AS A PART OF ITEMS in python

I want to split a whatsapp chat backup text by date and keep the date as part of messages. I tried and couldn’t achieve the exact result i want. If anyone can suggest me a way to achieve this, that would be a big help. (I don’t know much about regex)

JavaScript

the above code does the job and keep the seperator as seperate item, but what i want it to be a part of its correponding message (item):

Current Result

JavaScript

WHAT I WANT

JavaScript

Advertisement

Answer

That happened because you used re.split that keeps the chunks captured in the resulting list as separate items.

Your regex makes sense only if your matches can span several lines, else, extracting any line that starts with a time-like pattern would be enough.

That is why I’d suggest

JavaScript

See the Python demo:

JavaScript

Output:

JavaScript

Note the absence of the redundant capturing group and no * after the positive lookahead that made it optional. Whitespaces at the end of each match are stripped using s* pattern inside the lookahead.

The re.S flag allows . to match any char including line break chars.

Advertisement