I am trying to figure out how to can condense the number of pattern matching rules I need to create to pick up on conditions like I saw a man at the SW corner of Pike street. For “SW Corner”, I don’t want to write a match rule for every directional variation (N, S, E, W, etc.). I tried to do it with the below, but it isn’t right. Well, it works but it also picks up other things.
Example:
matcher.add("DIRECTION", None,
[{}, {"TEXT":{"REGEX":"(?:N)|(?:S)|(?:E)|(?:W)|(?:NW)|(?:NE)|(?:SW)|(?:SE)"}}, {"LOWER":"corner"}]
)
I want to be able to use the OR statements, but I am not sure how to do that with the single or double characters (N, S, E, W, NW, NE, SW, SE).
What am I doing wrong?
Advertisement
Answer
Do not overuse non-capturing groups, (?:SW) is the same as SW.
Also, you do not want to match SE in SED token, use anchors, ^ and $.
Use
{"REGEX":"^(?:N|S|E|W|NW|NE|SW|SE)$"}
See proof.
Case insensitive variant:
{"REGEX":"(?i)^(?:N|S|E|W|NW|NE|SW|SE)$"}
Explanation
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
N 'N'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
S 'S'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
E 'E'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
W 'W'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
NW 'NW'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
NE 'NE'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
SW 'SW'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
SE 'SE'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
$ before an optional n, and the end of the
string