Python convert switch data (text) to dict

Question

I have the following data, which I recieve via a ssh session to a switch. I wish to convert the input which is text to a dict for easy access and the possiblity to monitor certain values. I cannot get the data extracted without a ton of splits and regexes and still get stuck. Which I want to convert to:

Accepted Answer

Updated (Complete rewrite and simplification).Here are some ideas for you &#8212; adjust to taste.The solution herein tries to avoid using &#8220;domain specific knowledge&#8221; as much as possible. The only assumptions are:Empty lines don&#8217;t matter.Indentation is meaningful.Keys are transformed to lowercase, and some content is removed (stuff in parentheses, 'name', 'threshold', and /...).When a line has multiple &#8220;key : value&#8221; pairs or is followed by an indented group of lines, that is a block of information pertaining to the first key.Ultimately, when a key has multiple values (e.g. 'port'), then these values are put together as a list. When a key has a value that is a single dict (like for 'temp'), then the first key of that dict (the same as the key itself) is replaced by 'value'. Thus, we will see:{'port': [{'port': 1, ...}, {'port': 2, ...}, ...]}, but{'temp': {'value': 37, ...}}.RecordsWe start by splitting each line into (key, value) pairs and note the indentation of the line. The result is a list of records, each containing: (indent, [(k0, v0), ...]):import redef proc_kv(k, v):    k = re.sub(r'(.*)', '', k.lower())    k = re.sub(r' (?:name|threshold)', '', k)    k = re.sub(r'/S+', '', k)    k = '_'.join(k.strip().split())    for typ in (int, float):        try:            v = typ(v)            break        except ValueError:            pass    return k, vdef proc_line(s):    s = re.sub(r't', ' ' * 4, s)  # handle tabs if any    # split into one or more key-value pairs    p = [e.strip() for e in re.split(r':', s)]    if len(p) < 2:        return None    # if there are several pairs, use the largest space    # to split '{v[i]} {k[i+1]}'    p = [p[0]] + [        e for x in p[1:-1]        for e in x.split(max(re.split(r'( +)', x)[1::2]), maxsplit=1)    ] + [p[-1]]    kv_pairs = [proc_kv(k, v) for k, v in zip(p[::2], p[1::2])]    # figure out the indentation of that line    indent = len(s) - len(s.lstrip(' '))    return indent, kv_pairsExample on your text:records = [r for r in [proc_line(s) for s in txt.splitlines()] if r]>>> records[(0, [('port', 1)]), (4, [('media_type', 'SF+_SR')]), (4, [('vendor', 'VENDORX')]), (4, [('part_number', 'SFP-10G-SR')]), (4, [('serial_number', 'Gxxxxxxxx')]), (4, [('wavelength', '850 nm')]), (4, [('temp', 37.0), ('status', 'Normal')]), (10, [('low_warn', -40.0), ('high_warn', 85.0)]), ...Note that not only keys but also values may contain spaces (e.g. 'Wavelength : 850 nm'). We decided to use the largest space to split intermediary '{v[i] k[i+]}' substrings. Thus:>>> proc_line('  a b : 34 nm  c d : 4 ft')(2, [('a_b', '34 nm'), ('c_d', '4 ft')])# but>>> proc_line('  a b : 34 nm c d : 4 ft')(2, [('a_b', 34), ('nm_c_d', '4 ft')])BlocksWe then construct a hierarchical representation of the records in  way that takes indentation into account:def get_blocks(records, parent=None):    indent, _ = records[0]    starts = [i for i, (o_indent, _) in enumerate(records) if o_indent == indent]    block = [] if parent is None else parent.copy()    continuation_block = len(block) > 1    for i, j in zip(starts, starts[1:] + [len(records)]):        _, kv = records[i]        continuation_block &= (single_line := i + 1 == j)        if continuation_block:            block += kv        elif single_line:            block += [(kv[0][0], kv)] if len(kv) > 1 else kv        else:            block.append((kv[0][0], get_blocks(records[i+1:j], parent=kv)))    return blockExample on the records above (obtained from your txt):blocks = get_blocks(records)>>> blocks[('port',  [('port', 1),   ('media_type', 'SF+_SR'),   ('vendor', 'VENDORX'),   ('part_number', 'SFP-10G-SR'),   ('serial_number', 'Gxxxxxxxx'),   ('wavelength', '850 nm'),   ('temp',    [('temp', 37.0),     ...Note the repeated first key in sub blocks (e.g. ('port', [('port', 1), ...]) and ('temp', [('temp', 37.0), ...]).Final structureWe then transform the blocks hierarchical structure into a dict, with some ad-hoc logic (no clobbering (k, v) pairs that have the same key, etc.). And finally put all the pieces together in a proc_txt() function:def reshape(a):    if isinstance(a, list) and len(a) == 1:        a = a[0]    if isinstance(a, dict):        a = {'value' if i == 0 else k: v for i, (k, v) in enumerate(a.items())}    return adef to_dict(blocks):    if not isinstance(blocks, list):        return blocks    d = {}    for k, v in blocks:        d[k] = d.get(k, []) + [to_dict(v)]    return {k: reshape(v) for k, v in d.items()}def proc_txt(txt):    records = [r for r in [proc_line(s) for s in txt.splitlines()] if r]    blocks = get_blocks(records)    d = to_dict(blocks)    return dExample on your text>>> proc_txt(txt){'port': [{'port': 1,   'media_type': 'SF+_SR',   'vendor': 'VENDORX',   'part_number': 'SFP-10G-SR',   'serial_number': 'Gxxxxxxxx',   'wavelength': '850 nm',   'temp': {'value': 37.0,    'status': 'Normal',    'low_warn': -40.0,    'high_warn': 85.0,    'low_alarm': -50.0,    'high_alarm': 100.0},    ...]}

Advertisement

Answer

Records

Blocks

Final structure

Example on your text