Skip to content
Advertisement

Split a string from a textfile correctly

i have a file (“text.txt”) with several lines (~5000000 lines). I’m trying to split a line so that:

2   :   :   PUNCT   sent    _   _   _   _   _3-4    L'Algebra   _   _   _   _   _   _   _   _

becomes this two lines:

2   :   :   PUNCT   sent    _   _   _   _   _
3-4 L'Algebra   _   _   _   _   _   _   _   _

so essentially i want to transform a single line into two lines and write it back to another file. All the lines that has to be split starts with the character “” (underscore) and a number or number+”-“+number. I want to split the line into two lines after the character “” (underscore).

If i try to split the line with this function:

lines = re.split("_d")

and write the list lines to a file after i get this:

2   :   :   PUNCT   sent    _   _   _   _   _
-4 L'Algebra   _   _   _   _   _   _   _   _

How can i get to do this correctly? Can anyone help me please?

Advertisement

Answer

Try:

>>> re.split("_(?=d+)", line)

['2   :   :   PUNCT   sent    _   _   _   _   ',
 "3-4    L'Algebra   _   _   _   _   _   _   _   _"]
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement