I have a string that contains a number of keywords. I would like to split the string into a list of those keywords (but keep the keywords because they identify what the following data means)
Take the following string for example:
test_string = "ªttypmp3pfilfDjTunes/DJ Music/(I've Had) The Time Of My Life.mp3tsng<(I've Had) The Time Of My Lifetart:Bill Medley & Jennifer Warnes"
the important keywords are “ttyp”, “pfil”, “tsng”, “tart”. I would like to split the file so the output looks:
split_test_string = ["ª","ttypmp3","pfilfDjTunes/DJ Music/(I've Had) The Time Of My Life.mp3","tsng<(I've Had) The Time Of My Life","tart:Bill Medley & Jennifer Warnes"].
I’ve been researching regular expressions, and I think this expression would work, but when tested in Python, I end up losing the part that I want to keep. According to the Python re.split documents, this should work.
Checkout my regex calculator: https://regex101.com/r/FOlgv8/1
Note: I’m trying to get the first part to work. Then I’ll add the rest of the keywords using |.
regex = r'(?=ttyp).*'
This is my example code:
import re
regex = r'(?=ttyp).*'
split_test_string = re.split(regex, test_string)
print(f"Results: {split_test_string}")
Console Output:
Results: ['ª', '']
I’ve tried positive lookahead and positive lookback with no luck. I could just use a literal ‘ttyp’ but then I lose the keyword.
Any help would be appreciated, I’ve been researching, trial and erroring (mostly erroring) for hours now.
Advertisement
Answer
Here ya go:
re.split("(?=ttyp|pfil|tsng|tart)", test_string)
The reason yours didn’t work is that you split by .*, meaning you capture everything after the separator and treat it as the seperator itself (and thus throw it).
