competing regular expressions (race condition)

Question

I&#8217;m trying to use python PLY (lex/yacc) to parse a language called &#8216;GRBL&#8217;. GRBL looks something like this: The &#8216;G&#8217; Codes tell a machine to &#8216;go&#8217; (or move) and the coordinates say where. LEX requires us to specify a unique regular expression for every possible &#8216;to…

Accepted Answer

Note: There&#8217;s pretty well no reason to write {1} in a regular expression. It means that the previous element should be repeated exactly once, which is what would have happened without the repetition operator. So all it does is to obfuscate the regular expression (and slow down matching).But that&#8217;s not your problem. Your problem is likely the order in which Ply applies the regular expressions. Ply creates a single massive Python regular expression by concatenating all patterns into a set of alternatives:(pattern1)|(pattern2)|(pattern3)|...|(patternz)The order in which the patterns are inserted is important because Python &#8220;regular&#8221; expressions use an ordered alternation operator (making them actually irregular in mathematical terms, but that&#8217;s a side issue). So once some alternative matches, the following ones are not even tried.The Ply manual defines the ordering:All tokens defined by functions are added in the same order as they appear in the lexer file.Tokens defined by strings are added next by sorting them in order of decreasing regular expression length (longer expressions are added first).I&#8217;m guessing that you&#8217;re using functions, so that the patterns are in order by appearance in the file, because your second pattern &#8211;which is longer&#8211; would by applied first if they were defined as strings. But without seeing your actual file, it&#8217;s very hard to know for sure.In any case, conventional wisdom for Ply lexers is to use as few patterns as possible, preferring to map keywords to tokens with dictionaries. In the case of GRBL one possibility might be to use [Gg][0-9]+(.[0-9]÷)? as the pattern and then extract the index in the semantic action.

Advertisement

Answer