I am trying to figure out how to can condense the number of pattern matching rules I need to create to pick up on conditions like I saw a man at the SW corner of Pike street. For “SW Corner”, I don’t want to write a match rule for every directional variation (N, S, E, W, etc.). I tried to do it with the below, but it isn’t right. Well, it works but it also picks up other things.
Example:
matcher.add("DIRECTION", None, [{}, {"TEXT":{"REGEX":"(?:N)|(?:S)|(?:E)|(?:W)|(?:NW)|(?:NE)|(?:SW)|(?:SE)"}}, {"LOWER":"corner"}] )
I want to be able to use the OR statements, but I am not sure how to do that with the single or double characters (N, S, E, W, NW, NE, SW, SE).
What am I doing wrong?
Advertisement
Answer
Do not overuse non-capturing groups, (?:SW)
is the same as SW
.
Also, you do not want to match SE
in SED
token, use anchors, ^
and $
.
Use
{"REGEX":"^(?:N|S|E|W|NW|NE|SW|SE)$"}
See proof.
Case insensitive variant:
{"REGEX":"(?i)^(?:N|S|E|W|NW|NE|SW|SE)$"}
Explanation
-------------------------------------------------------------------------------- ^ the beginning of the string -------------------------------------------------------------------------------- (?: group, but do not capture: -------------------------------------------------------------------------------- N 'N' -------------------------------------------------------------------------------- | OR -------------------------------------------------------------------------------- S 'S' -------------------------------------------------------------------------------- | OR -------------------------------------------------------------------------------- E 'E' -------------------------------------------------------------------------------- | OR -------------------------------------------------------------------------------- W 'W' -------------------------------------------------------------------------------- | OR -------------------------------------------------------------------------------- NW 'NW' -------------------------------------------------------------------------------- | OR -------------------------------------------------------------------------------- NE 'NE' -------------------------------------------------------------------------------- | OR -------------------------------------------------------------------------------- SW 'SW' -------------------------------------------------------------------------------- | OR -------------------------------------------------------------------------------- SE 'SE' -------------------------------------------------------------------------------- ) end of grouping -------------------------------------------------------------------------------- $ before an optional n, and the end of the string