Skip to content
Advertisement

Pyparsing recursive type definition in DSL

I am parsing a domain specific language which has several type definitions. Primitive types have been relatively straight-forward to parse, but moving on to the complex types in the DSL have proven more challenging. I am looking for a strategy to recursively match a definition that can contain a reference to itself.

For example, the map data type can look like this at a high level:

map<primitive_type, complex_type>

Which can appear in the input as a recursive definition:

map<string, map<int, map<string, ...>>>=

This map of maps can theoretically be of arbitrary depth.

To make matters more complicated, these nested complex types could also be other complex types as well.

map<string, map<int, map<string, array<array<string>>>>>

I’m looking for a strategy to handle parsing this so that I get an accurate type containing the nested map type parsed out. I considered just using a regex, but this doesn’t feel optimal to me. I also know there exists Forward for recursive grammars, but I haven’t quite been able to figure out how to apply it to this scenario. If there are any good patterns or references for this type of parsing I would really appreciate any guidance!

Advertisement

Answer

Forward is the way to do this. Since type declarations may themselves contain type declarations, this is probably a good choice for where the Forward will go:

import pyparsing as pp
type_decl = pp.Forward()

Looking at your examples, I can see we will need some punctuation, we can define some suppressed expressions for these so that they parse, but won’t junk up the parsed results:

LANGLE, RANGLE, COMMA = map(pp.Suppress, "<>,")

# if we get an ellipsis, we want to keep it, so use Literal instead of Suppress
ELLIPSIS = pp.Literal("...")

Next we can do the simple types:

simple_type = pp.oneOf("string int float", asKeyword=True)

Now the two complex types, map and array. For these, we will use the Forward wherever a complex type might occur:

MAP, ARRAY = map(pp.Keyword, ["map", "array"])
map_type = MAP - LANGLE + pp.Group(simple_type + COMMA + (type_decl | ELLIPSIS)) + RANGLE
array_type = ARRAY - LANGLE + type_decl + RANGLE
complex_type = pp.Group(map_type | array_type)

I use the ‘-‘ operator here since, if there is a syntax error after seeing a valid “map” or “array” keyword, pyparsing will give us the error within that expression.

At this point we have our complex and simple types, so we can “inject” them into the previously-defined Forward, using the <<= operator:

type_decl <<= (complex_type | simple_type)

Check using runTests:

type_decl.runTests("""
    string
    int
    float
    map<string, map<int, map<string, ...>>>
    map<string, map<int, map<string, array<array<string>>>>>
    """, fullDump=False)

Gives:

string
['string']

int
['int']

float
['float']

map<string, map<int, map<string, ...>>>
[['map', ['string', ['map', ['int', ['map', ['string', '...']]]]]]]

map<string, map<int, map<string, array<array<string>>>>>
[['map', ['string', ['map', ['int', ['map', ['string', ['array', ['array', 'string']]]]]]]]]
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement