Skip to content
Advertisement

Parsing a string as a Python argument list

Summary

I would like to parse a string that represents a Python argument list into a form that I can forward to a function call.

Detailed version

I am building an application in which I would like to be able to parse out argument lists from a text string that would then be converted into the *args,**kwargs pattern to forward to an actual method. For example, if my text string is:

"hello",42,helper="Larry, the "wise""

the parsed result would be something comparable to:

args=['hello',42]
kwargs={'helper': 'Larry, the "wise"'}

I am aware of Python’s ast module, but it only seems to provide a mechanism for parsing entire statements. I can sort of fake this by manufacturing a statement around it, e.g.

ast.parse('f("hello",42,helper="Larry, the "wise"")'

and then pull the relevant fields out of the Call node, but this seems like an awful lot of roundabout work.

Is there any way to parse just one known node type from a Python AST, or is there an easier approach for getting this functionality?

If it helps, I only need to be able to support numeric and string arguments, although strings need to support embedded commas and escaped-out quotes and the like.

If there is an existing module for building lexers and parsers in Python I am fine with defining my own AST, as well, but obviously I would prefer to just use functionality that already exists and has been tested correct and so on.

Note: Many of the answers focus on how to store the parsed results, but that’s not what I care about; it’s the parsing itself that I’m trying to solve, ideally without writing an entire parser engine myself.

Also, my application is already using Jinja which has a parser for Python-ish expressions in its own template parser, although it isn’t clear to me how to use it to parse just one subexpression like this. (This is unfortunately not something going into a template, but into a custom Markdown filter, where I’d like the syntax to match its matching Jinja template function as closely as possible.)

Advertisement

Answer

I think ast.parse is your best option.

If the parameters were separated by whitespace, we could use shlex.split:

>>> shlex.split(r'"hello" 42 helper="Larry, the "wise""')
['hello', '42', 'helper=Larry, the "wise"']

But unfortunately, that doesn’t split on commas:

>>> shlex.split(r'"hello",42,helper="Larry, the "wise""')
['hello,42,helper=Larry, the "wise"']

I also thought about using ast.literal_eval, but that doesn’t support keyword arguments:

>>> ast.literal_eval(r'"hello",42')
('hello', 42)
>>> ast.literal_eval(r'"hello",42,helper="Larry, the "wise""')
Traceback (most recent call last):
  File "<unknown>", line 1
    "hello",42,helper="Larry, the "wise""
                     ^
SyntaxError: invalid syntax

I couldn’t think of any python literal that supports both positional and keyword arguments.


In lack of better ideas, here’s a solution using ast.parse:

import ast

def parse_args(args):
    args = 'f({})'.format(args)
    tree = ast.parse(args)
    funccall = tree.body[0].value

    args = [ast.literal_eval(arg) for arg in funccall.args]
    kwargs = {arg.arg: ast.literal_eval(arg.value) for arg in funccall.keywords}
    return args, kwargs

Output:

>>> parse_args(r'"hello",42,helper="Larry, the "wise""')
(['hello', 42], {'helper': 'Larry, the "wise"'})
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement