This is an example list of strings
new_text = ['XIC(Switch_A)OTE(Light1) XIC(Light1)OTE(Light2) Motor On Delay Timer XIC(Light1)TON(Motor_timer', '?', '?) XIC(Motor_timer.DN)OTE(Motor)']
I would like to extract XIC(Switch_A)
into one list, OTE(Light1)
into another list, TON(Motor_timer)
into another list and so on.
This is the code in Python 3 that I have tried
for words in new_text: match = re.search('XIC(.*)', words) print(match.group(1))
How do I go about extracting OTE(Tag name)
, XIC(Tag name)
, XIO(Tag name)
into their own lists or groups?
Advertisement
Answer
You can use the following regex to match any three uppercase letters, followed by anything in parentheses:
([A-Z]{3})(([^)]+)) ( ) : Capturing group 1 ( ) : Capturing group 2 [A-Z]{3} : Exactly three uppercase letters ( ) : Literal open/close parentheses [^)]+ : One or more of any character that is not )
Use a collections.defaultdict
to keep track of all your results. The identifier will be the key for this defaultdict, and the values will be lists containing all the matches for that identifier.
from collections import defaultdict results = defaultdict(list) regex = re.compile(r"([A-Z]{3})(([^)]+))") for s in new_text: matches = regex.findall(s) for m in matches: identifier = m[0] results[identifier].append(m[0] + m[1])
Which gives the following results
:
{'XIC': ['XIC(Switch_A)', 'XIC(Light1)', 'XIC(Light1)', 'XIC(Motor_timer.DN)'], 'OTE': ['OTE(Light1)', 'OTE(Light2)', 'OTE(Motor)']}
Since you have a fixed set of identifiers, you can replace the [A-Z]{3}
portion of the regex with something that will only match your identifiers:
regex = re.compile(r"(XIC|XIO|OTE|TON|TOF)(([^)]+))")
It is also possible to build this regex if you have your identifiers in an iterable:
identifiers = ["XIC", "XIO", "OTE", "TON", "TOF"] regex = re.compile(rf"({'|'.join(identifiers)})(([^)]+))")