Skip to content
Advertisement

Nested `defaultdict of defaultdict of defaultdict` each with a backreference

Using tree = lambda: dedfaultdict(tree), I can replace the following code:

from collections import defaultdict

END = '$'
words = ['hi', 'hello', 'hiya', 'hey']

root = {}
for word in words:
  node = root
  for ch in word:
    node = node.setdefault(ch, {}) # <---- Code that can be replaced
  node[END] = None

with:

from collections import defaultdict

END = '$'
words = ['hi', 'hello', 'hiya', 'hey']

tree = lambda: defaultdict(tree)
root = tree()
for word in words:
  node = root
  for ch in word:
    node = node[ch] # <------ Replaced code
  node[END] = None

What I actually want is for each dictionary node to have a backreference to its parent dictionary node. I can do this as follows:

from collections import defaultdict

BACKREF, END = 'BACKREF', '$'
words = ['hi', 'hello', 'hiya', 'hey']

root = {}
for word in words:
  node = root
  for ch in word:
    node = node.setdefault(ch, {BACKREF: node}) # <---- Code I want to replace
  node[END] = None

(proof that this works: link)

So, given that I was able to use tree = lambda: defaultdict(tree) to replace

  • node = node.setdefault(ch, {})
  • with node = node[ch]

is there a way I can use a modified version of tree = lambda: default(tree) to replace

  • node = node.setdefault(ch, {BACKREF: node})
  • with something simpler, such as perhaps node = node[ch] ?

I have tried something like:

def tree():
  _ = defaultdict(tree)
  _[BACKREF] = ?
  return _
root = tree()
h = root['h']

But that would require tree to know which dictionary invoked the call to tree. E.g. in h = root['h'], root['h'] invokes a call to tree because h is not yet in root. tree would have to know it was invoked via the call root['h'] so that it could do h[BACKREF] = root. Is there a way around this? Is it a bad idea even if it can be done?

I know the backreferences technically means that the trie will have cycles (and not be truly a tree), but the way I plan to traverse the trie, this won’t be an issue. The reason I want backreferences is because it will be useful if I want to delete a word from the trie. For example, say I have the following trie:

Trie

and that I am at root['h']['e']['l']['l']['o'] and want to delete 'hello' from the trie. I can do this by backtracking up the trie from root['h']['e']['l']['l']['o'] to root['h']['e']['l']['l'] to root['h']['e']['l'] to root['h']['e'] (I stop here because len(set(root['h']['e'].keys()) - {BACKREF}) > 1. Then I can simply do del root['h']['e']['l'] and I will have cleaved off the 'llo$' from 'he' meaning the trie will still have 'hey'. Although there are alternatives, backtracking up the trie will be very easy with the backreferences.


Context on tree = lambda: defaultdict(tree)

Using:

from collections import defaultdict

tree = lambda: defaultdict(tree)
root = tree()

one can create arbitrarily nested dicts. E.g. after:

root['h']['i']
root['h']['e']['l']['l']['o']
root['h']['i']['y']['a']
root['h']['e']['y']

root will look like:

{'h': {'i': {'y': {'a': {}}}, 'e': {'y': {}, 'l': {'l': {'o': {}}}}}}

This represents a tree that looks like this: Trie visualized using https://www.cs.usfca.edu/~galles/visualization/Trie.html

Advertisement

Answer

Solution 1

Giving the same {BACKREF: node} to defaultdict:

from collections import defaultdict

BACKREF, END = 'BACKREF', '$'
words = ['hi', 'hello', 'hiya', 'hey']

tree = lambda: defaultdict(tree, {BACKREF: node})
node = None
root = tree()
for word in words:
  node = root
  for ch in word:
    node = node[ch]
  node[END] = None

The root node has a backref None, can be deleted if bothersome.

Solution 2

The above works fine if that code is the only code that creates tree nodes (seems likely to me, judging by the times I’ve built such trees myself). Otherwise you’d need to ensure that node points to the correct parent node. If that’s an issue, here’s an alternative with a dict (not defaultdict) subclass that implements __missing__ to automatically create children with backrefs when needed:

BACKREF, END = 'BACKREF', '$'
words = ['hi', 'hello', 'hiya', 'hey']

class Tree(dict):
    def __missing__(self, key):
        child = self[key] = Tree({BACKREF: self})
        return child

root = Tree()
for word in words:
  node = root
  for ch in word:
    node = node[ch]
  node[END] = None

Also doesn’t give the root a backref, and being a dict, its string representations are far less cluttered than a defaultdict’s and thus far more readable:

>>> import pprint
>>> pprint.pp(root)
{'h': {'BACKREF': <Recursion on Tree with id=2494556270320>,
       'i': {'BACKREF': <Recursion on Tree with id=2494556270400>,
             '$': None,
             'y': {'BACKREF': <Recursion on Tree with id=2494556270480>,
                   'a': {'BACKREF': <Recursion on Tree with id=2494556340608>,
                         '$': None}}},
       'e': {'BACKREF': <Recursion on Tree with id=2494556270400>,
             'l': {'BACKREF': <Recursion on Tree with id=2494556340288>,
                   'l': {'BACKREF': <Recursion on Tree with id=2494556340368>,
                         'o': {'BACKREF': <Recursion on Tree with id=2494556340448>,
                               '$': None}}},
             'y': {'BACKREF': <Recursion on Tree with id=2494556340288>,
                   '$': None}}}}

The defaultdict result for comparison:

>>> pprint.pp(root)
defaultdict(<function <lambda> at 0x000001A13760BE50>,
            {'BACKREF': None,
             'h': defaultdict(<function <lambda> at 0x000001A13760BE50>,
                              {'BACKREF': <Recursion on defaultdict with id=1791930855152>,
                               'i': defaultdict(<function <lambda> at 0x000001A13760BE50>,
                                                {'BACKREF': <Recursion on defaultdict with id=1791930855312>,
                                                 '$': None,
                                                 'y': defaultdict(<function <lambda> at 0x000001A13760BE50>,
                                                                  {'BACKREF': <Recursion on defaultdict with id=1791930912832>,
                                                                   'a': defaultdict(<function <lambda> at 0x000001A13760BE50>,
                                                                                    {'BACKREF': <Recursion on defaultdict with id=1791930913232>,
                                                                                     '$': None})})}),
                               'e': defaultdict(<function <lambda> at 0x000001A13760BE50>,
                                                {'BACKREF': <Recursion on defaultdict with id=1791930855312>,
                                                 'l': defaultdict(<function <lambda> at 0x000001A13760BE50>,
                                                                  {'BACKREF': <Recursion on defaultdict with id=1791930912912>,
                                                                   'l': defaultdict(<function <lambda> at 0x000001A13760BE50>,
                                                                                    {'BACKREF': <Recursion on defaultdict with id=1791930912992>,
                                                                                     'o': defaultdict(<function <lambda> at 0x000001A13760BE50>,
                                                                                                      {'BACKREF': <Recursion on defaultdict with id=1791930913072>,
                                                                                                       '$': None})})}),
                                                 'y': defaultdict(<function <lambda> at 0x000001A13760BE50>,
                                                                  {'BACKREF': <Recursion on defaultdict with id=1791930912912>,
                                                                   '$': None})})})})
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement