Skip to content
Advertisement

python regex OR on single charcters with spacy pattern matching

I am trying to figure out how to can condense the number of pattern matching rules I need to create to pick up on conditions like I saw a man at the SW corner of Pike street. For “SW Corner”, I don’t want to write a match rule for every directional variation (N, S, E, W, etc.). I tried to do it with the below, but it isn’t right. Well, it works but it also picks up other things.

Example:

matcher.add("DIRECTION", None,
           [{}, {"TEXT":{"REGEX":"(?:N)|(?:S)|(?:E)|(?:W)|(?:NW)|(?:NE)|(?:SW)|(?:SE)"}}, {"LOWER":"corner"}]
           )

I want to be able to use the OR statements, but I am not sure how to do that with the single or double characters (N, S, E, W, NW, NE, SW, SE).

What am I doing wrong?

Advertisement

Answer

Do not overuse non-capturing groups, (?:SW) is the same as SW.

Also, you do not want to match SE in SED token, use anchors, ^ and $.

Use

{"REGEX":"^(?:N|S|E|W|NW|NE|SW|SE)$"}

See proof.

Case insensitive variant:

{"REGEX":"(?i)^(?:N|S|E|W|NW|NE|SW|SE)$"}

Explanation

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    N                        'N'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    S                        'S'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    E                        'E'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    W                        'W'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    NW                       'NW'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    NE                       'NE'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    SW                       'SW'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    SE                       'SE'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  $                        before an optional n, and the end of the
                           string
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement