Skip to content
Advertisement

regex convert to object

I’m trying to get the value as regex as follow:

from textx import metamodel_from_str


def test_get_hosts2():
    grammar = r"""
    config: ( /(?!host)./ | hosts+=host | 'host' )* ;

    host: 'host' host2name=/[0-9a-zA-Z.-]+/ '{'
        (
            'fixed-address' fixed_address=/([0-9]{1,3}.){3}[0-9]{1,3}/';'
            ('option host-name' option_host_name=STRING';')?
            ('option domain-name-servers' option_domain_name_servers=/([0-9]{1,3}.){3}[0-9]{1,3}, ([0-9]{1,3}.){3}[0-9]{1,3}/';')?
            ('option netbios-name-servers' option_netbios_name_servers=/([0-9]{1,3}.){3}[0-9]{1,3}/';')?
            ('option domain-name' option_domain_name=STRING';')?
        )#
    '}'
    ;
    """

    conf_file = r"""
    host corehost.abc.abc.ab {
    fixed-address 172.124.106.10;
    option host-name "hostname.abc.abc.ab";
    option domain-name-servers 123.123.123.120, 123.123.128.142;
    option netbios-name-servers 172.124.106.156;
    option domain-name "abcm1.abc.abc.ab";
    option domain-search "abcm1.abc.abc.ab", "abcmo2.abc.abc.ab", "abcmo.3abc.abc.ab", "abcmo4.abc.abc.ab";
    }

    host corehost2.abc.abc.ab {
    fixed-address 172.124.106.120;
    option host-name "hostname2.abc.abc.ab";
    option domain-name-servers 123.123.123.220, 123.123.128.242;
    option netbios-name-servers 172.124.106.256;
    option domain-name "abcm2.abc.abc.ab";
    option domain-search "abcm2.abc.abc.ab", "abcmo2.abc.abc.ab", "abcm.3abc.abc.ab", "abcm4.abc.abc.ab";
    }

    """
    mm = metamodel_from_str(grammar)
    model = mm.model_from_str(conf_file)
    print(model.hosts)
    # assert len(model.hosts) == 2
    for host in model.hosts:
        print(host)
        print(host.host2name, host.fixed_address, host.option_domain_name_servers, host.option_domain_search)


if __name__ == "__main__":
    test_get_hosts2()

But I can get the only single value such as “fixed-address” and “host2name”. In “domain-name-servers” I did with “,” in regex. But I think it isn’t the right way because the values are not same count. Could you help me to get the value of “domain-name-servers” and “domain-search” with right regex?

ref: Parsing dhcpd.conf with textX

Advertisement

Answer

The easiest way is to use textX’s repetition modifiers for matching a sequence of comma-separated values. Basically, whenever you match zero-or-more or one-or-more etc. you can add modifier in the square brackets. The most frequently used modifier is Separator modifier which basically is a match that is used between each two elements.

The other side bonuses instead of trying to match everything with a single regex are:

  • simplicity (easier to maintain)
  • you get a nice Python list of elements so you don’t need to process the matched string further.

The working grammar would be (notice the use of +[','] which means one-or-more with a comma as a separator):

    config: ( /(?!host)./ | hosts+=host | 'host' )* ;

    host: 'host' host2name=/[0-9a-zA-Z.-]+/ '{'
        (
            'fixed-address' fixed_address=ip_addr';'
            ('option' 'host-name' option_host_name=STRING';')?
            ('option' 'domain-name-servers' option_domain_name_servers=ip_addr+[',']';')?
            ('option' 'netbios-name-servers' option_netbios_name_servers=ip_addr+[',']';')?
            ('option' 'domain-name' option_domain_name=STRING+[',']';')?
            ('option' 'domain-search' option_domain_search=STRING+[',']';')?
        )#
    '}';

    ip_addr: /([0-9]{1,3}.){3}[0-9]{1,3}/;

Advertisement