I have a question about how to use regex at this condition (or can be in any solution in Python):
What I want to achieve is to split the colon ‘:’ if it’s found outside a string, but don’t split it if it’s inside a string, like this example below:
Regex I use: (?!B"[^"]*):(?![^"]*"B)
string_to_split: str = '"A: String 1": "B: String 2": C: "D: String 4"'
Output > ["A: String 1", "B: String 2", 'C', "D: String 4"]
It got what I’ve expected, but somehow it won’t work if I put anything in front of a string that is not in a letter or a number (somehow, it won’t be split by regex if in front of a string are symbols/spaces, etc) like this one:
string_to_split: str = '"A: String 1": "B: String 2": C: " D: String 4"'
(space before letter ‘D’)
Output > ["A: String 1", "B: String 2": C: " D: String 4"]
The reason why I do this is that I want to get more comfortable using regex in Python (I barely use regex when coding), so I think it might have to use look-ahead or look-behind but don’t know really much about it… I really appreciate you guys if you got into some sort of solution for this, thank you…
Advertisement
Answer
Would you please try the following:
import re pat='(?:[^:]*"[^"]+"[^:]*)|[^:]+' str = '"A: String 1": "B: String 2": C: " D: String 4"' m = [x.strip() for x in re.findall(pat, str)] #m = [x.strip('" ') for x in re.findall(pat, str)] # removes double quotes too print(m)
Output:
['"A: String 1"', '"B: String 2"', 'C', '" D: String 4"']
- The regex
pat
matches any sequences of characters other than a colon, while allowing the existence of colons within the double quotes. - The regex leaves the leading/trailing whitespaces, which is then removed
by
strip()
.
If you want to remove the surrounding double quotes as well, apply
strip('" ')
instead. Then the output will be:
['A: String 1', 'B: String 2', 'C', 'D: String 4']