Given this string:
s = '01/03/1988 U/9 Min08/19/1966 ABCnDEFn12/31/1999 YTD ABC'
I want to split it on each new record (which starts with a date) like this:
['01/03/1988 U/9 Mi', '08/19/1966 ABCnDEF', '12/31/1999 YTD ABC']
Notice the extra new line delimiter between ABC and DEF? That’s the challenge I’m having. I want to preserve it without a split there. I’m thinking I need to conditionally split on any delimiter of these:
['01/', '02/','03/', '04/', '05/', '06/', '07/', '08/', '09/', '10/', '11/', '12/']
Is there an easy way to use re.findall
this way or is there a better approach?
Thanks in advance!
Advertisement
Answer
You could split on the new line that is followed by a date with a lookahead. Something like:
import re s = '01/03/1988 U/9 Min08/19/1966 ABCnDEFn12/31/1999 YTD ABC' re.split(r'n(?=d{2}/d{2}/d{4})', s) # ['01/03/1988 U/9 Mi', '08/19/1966 ABCnDEF', '12/31/1999 YTD ABC']
You may be able to simplify to just a newline followed by 2 digits depending on your data: r'n(?=d{2})'