Skip to content
Advertisement

Python Conditional Split

Given this string:

s = '01/03/1988 U/9 Min08/19/1966 ABCnDEFn12/31/1999 YTD ABC'

I want to split it on each new record (which starts with a date) like this:

['01/03/1988 U/9 Mi', '08/19/1966 ABCnDEF', '12/31/1999 YTD ABC']

Notice the extra new line delimiter between ABC and DEF? That’s the challenge I’m having. I want to preserve it without a split there. I’m thinking I need to conditionally split on any delimiter of these:

['01/', '02/','03/', '04/', '05/', '06/', '07/', '08/', '09/', '10/', '11/', '12/']

Is there an easy way to use re.findall this way or is there a better approach?

Thanks in advance!

Advertisement

Answer

You could split on the new line that is followed by a date with a lookahead. Something like:

import re

s = '01/03/1988 U/9 Min08/19/1966 ABCnDEFn12/31/1999 YTD ABC'
re.split(r'n(?=d{2}/d{2}/d{4})', s)

# ['01/03/1988 U/9 Mi', '08/19/1966 ABCnDEF', '12/31/1999 YTD ABC']

You may be able to simplify to just a newline followed by 2 digits depending on your data: r'n(?=d{2})'

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement