Given this string:
JavaScript
x
2
1
s = '01/03/1988 U/9 Min08/19/1966 ABCnDEFn12/31/1999 YTD ABC'
2
I want to split it on each new record (which starts with a date) like this:
JavaScript
1
2
1
['01/03/1988 U/9 Mi', '08/19/1966 ABCnDEF', '12/31/1999 YTD ABC']
2
Notice the extra new line delimiter between ABC and DEF? That’s the challenge I’m having. I want to preserve it without a split there. I’m thinking I need to conditionally split on any delimiter of these:
JavaScript
1
2
1
['01/', '02/','03/', '04/', '05/', '06/', '07/', '08/', '09/', '10/', '11/', '12/']
2
Is there an easy way to use re.findall
this way or is there a better approach?
Thanks in advance!
Advertisement
Answer
You could split on the new line that is followed by a date with a lookahead. Something like:
JavaScript
1
7
1
import re
2
3
s = '01/03/1988 U/9 Min08/19/1966 ABCnDEFn12/31/1999 YTD ABC'
4
re.split(r'n(?=d{2}/d{2}/d{4})', s)
5
6
# ['01/03/1988 U/9 Mi', '08/19/1966 ABCnDEF', '12/31/1999 YTD ABC']
7
You may be able to simplify to just a newline followed by 2 digits depending on your data: r'n(?=d{2})'