i have a file (“text.txt”) with several lines (~5000000 lines). I’m trying to split a line so that:
2 : : PUNCT sent _ _ _ _ _3-4 L'Algebra _ _ _ _ _ _ _ _
becomes this two lines:
2 : : PUNCT sent _ _ _ _ _ 3-4 L'Algebra _ _ _ _ _ _ _ _
so essentially i want to transform a single line into two lines and write it back to another file. All the lines that has to be split starts with the character “” (underscore) and a number or number+”-“+number. I want to split the line into two lines after the character “” (underscore).
If i try to split the line with this function:
lines = re.split("_d")
and write the list lines to a file after i get this:
2 : : PUNCT sent _ _ _ _ _ -4 L'Algebra _ _ _ _ _ _ _ _
How can i get to do this correctly? Can anyone help me please?
Advertisement
Answer
Try:
>>> re.split("_(?=d+)", line) ['2 : : PUNCT sent _ _ _ _ ', "3-4 L'Algebra _ _ _ _ _ _ _ _"]