Considering the following grammar:
expr : expr '+' term | expr '-' term | term term : term '*' factor | term '/' factor | factor factor : '(' expr ')' | identifier | number
This is my code using ply:
from ply import lex, yacc tokens = [ "identifier", "number", "plus", "minus", "mult", "div" ] t_ignore = r" t" t_identifier = r"^[a-zA-Z]+$" t_number = r"[+-]?(d+(.d*)?|.d+)([eE][+-]?d+)?" t_plus = r"+" t_minus = r"-" t_mult = r"*" t_div = r"/" def p_stmt(p): """stmt : expr""" p[0] = ("stmt", p[1]) def p_expr(p): """expr : expr plus term | expr minus term | term""" p[0] = ("expr", p[1], p[2]) # Problem here <<< def p_term(p): """term : term mult factor | term div factor | factor""" def p_factor(p): """factor : '(' expr ')' | identifier | number""" if __name__ == "__main__": lex.lex() yacc.yacc() data = "32 + 10" result = yacc.parse(data) print(result)
How am I supposed to build an AST with the expression if I can’t access the operators? I could separate the functions like p_expr_plus, but in this case, I would eliminate operator precedence. The docs are not so helpful, since I’m a beginner and can’t solve this problem. The best material I’ve found on the subject is this, but it does not consider the complexity of operator precedence.
EDIT: I can’t access p2 or p[3], since I get an IndexError (It’s matching the term only). In the PDF I’ve linked, they explicitly put the operator inside the tuple, like: (‘+’, p1, p2), and thus, evincing my problem considering precedence (I can’t separate the functions, the expression is the expression, there should be a way to consider the pipes and access any operator).
Advertisement
Answer
As far as I can see, in p[0] = ("expr", p[1], p[2])
, p[1]
would be the left hand expression, p[2]
would be the operator, and p[3]
(that you aren’t using) would be the right hand term.
Just use p[2]
to determine the operator, add p[3]
, since you will need it, and you should be good to go.
Also, you must verify how many items p
has, since if the last rule, | term"""
is matched, p
will only have two items instead of four.
Take a look at a snippet from the GardenSnake example:
def p_comparison(p): """comparison : comparison PLUS comparison | comparison MINUS comparison | comparison MULT comparison | comparison DIV comparison | comparison LT comparison | comparison EQ comparison | comparison GT comparison | PLUS comparison | MINUS comparison | power""" if len(p) == 4: p[0] = binary_ops[p[2]]((p[1], p[3])) elif len(p) == 3: p[0] = unary_ops[p[1]](p[2]) else: p[0] = p[1]