Using tree = lambda: dedfaultdict(tree)
, I can replace the following code:
from collections import defaultdict END = '$' words = ['hi', 'hello', 'hiya', 'hey'] root = {} for word in words: node = root for ch in word: node = node.setdefault(ch, {}) # <---- Code that can be replaced node[END] = None
with:
from collections import defaultdict END = '$' words = ['hi', 'hello', 'hiya', 'hey'] tree = lambda: defaultdict(tree) root = tree() for word in words: node = root for ch in word: node = node[ch] # <------ Replaced code node[END] = None
What I actually want is for each dictionary node to have a backreference to its parent dictionary node. I can do this as follows:
from collections import defaultdict BACKREF, END = 'BACKREF', '$' words = ['hi', 'hello', 'hiya', 'hey'] root = {} for word in words: node = root for ch in word: node = node.setdefault(ch, {BACKREF: node}) # <---- Code I want to replace node[END] = None
(proof that this works: link)
So, given that I was able to use tree = lambda: defaultdict(tree)
to replace
node = node.setdefault(ch, {})
- with
node = node[ch]
is there a way I can use a modified version of tree = lambda: default(tree)
to replace
node = node.setdefault(ch, {BACKREF: node})
- with something simpler, such as perhaps
node = node[ch]
?
I have tried something like:
def tree(): _ = defaultdict(tree) _[BACKREF] = ? return _ root = tree() h = root['h']
But that would require tree
to know which dictionary invoked the call to tree
. E.g. in h = root['h']
, root['h']
invokes a call to tree
because h
is not yet in root
. tree
would have to know it was invoked via the call root['h']
so that it could do h[BACKREF] = root
. Is there a way around this? Is it a bad idea even if it can be done?
I know the backreferences technically means that the trie will have cycles (and not be truly a tree), but the way I plan to traverse the trie, this won’t be an issue. The reason I want backreferences is because it will be useful if I want to delete a word from the trie. For example, say I have the following trie:
and that I am at root['h']['e']['l']['l']['o']
and want to delete 'hello'
from the trie. I can do this by backtracking up the trie from root['h']['e']['l']['l']['o']
to root['h']['e']['l']['l']
to root['h']['e']['l']
to root['h']['e']
(I stop here because len(set(root['h']['e'].keys()) - {BACKREF}) > 1
. Then I can simply do del root['h']['e']['l']
and I will have cleaved off the 'llo$'
from 'he'
meaning the trie will still have 'hey'
. Although there are alternatives, backtracking up the trie will be very easy with the backreferences.
Context on tree = lambda: defaultdict(tree)
Using:
from collections import defaultdict tree = lambda: defaultdict(tree) root = tree()
one can create arbitrarily nested dict
s. E.g. after:
root['h']['i'] root['h']['e']['l']['l']['o'] root['h']['i']['y']['a'] root['h']['e']['y']
root
will look like:
{'h': {'i': {'y': {'a': {}}}, 'e': {'y': {}, 'l': {'l': {'o': {}}}}}}
This represents a tree that looks like this: visualized using https://www.cs.usfca.edu/~galles/visualization/Trie.html
Advertisement
Answer
Solution 1
Giving the same {BACKREF: node}
to defaultdict
:
from collections import defaultdict BACKREF, END = 'BACKREF', '$' words = ['hi', 'hello', 'hiya', 'hey'] tree = lambda: defaultdict(tree, {BACKREF: node}) node = None root = tree() for word in words: node = root for ch in word: node = node[ch] node[END] = None
The root
node has a backref None
, can be deleted if bothersome.
Solution 2
The above works fine if that code is the only code that creates tree nodes (seems likely to me, judging by the times I’ve built such trees myself). Otherwise you’d need to ensure that node
points to the correct parent node. If that’s an issue, here’s an alternative with a dict (not defaultdict) subclass that implements __missing__
to automatically create children with backrefs when needed:
BACKREF, END = 'BACKREF', '$' words = ['hi', 'hello', 'hiya', 'hey'] class Tree(dict): def __missing__(self, key): child = self[key] = Tree({BACKREF: self}) return child root = Tree() for word in words: node = root for ch in word: node = node[ch] node[END] = None
Also doesn’t give the root a backref, and being a dict, its string representations are far less cluttered than a defaultdict’s and thus far more readable:
>>> import pprint >>> pprint.pp(root) {'h': {'BACKREF': <Recursion on Tree with id=2494556270320>, 'i': {'BACKREF': <Recursion on Tree with id=2494556270400>, '$': None, 'y': {'BACKREF': <Recursion on Tree with id=2494556270480>, 'a': {'BACKREF': <Recursion on Tree with id=2494556340608>, '$': None}}}, 'e': {'BACKREF': <Recursion on Tree with id=2494556270400>, 'l': {'BACKREF': <Recursion on Tree with id=2494556340288>, 'l': {'BACKREF': <Recursion on Tree with id=2494556340368>, 'o': {'BACKREF': <Recursion on Tree with id=2494556340448>, '$': None}}}, 'y': {'BACKREF': <Recursion on Tree with id=2494556340288>, '$': None}}}}
The defaultdict result for comparison:
>>> pprint.pp(root) defaultdict(<function <lambda> at 0x000001A13760BE50>, {'BACKREF': None, 'h': defaultdict(<function <lambda> at 0x000001A13760BE50>, {'BACKREF': <Recursion on defaultdict with id=1791930855152>, 'i': defaultdict(<function <lambda> at 0x000001A13760BE50>, {'BACKREF': <Recursion on defaultdict with id=1791930855312>, '$': None, 'y': defaultdict(<function <lambda> at 0x000001A13760BE50>, {'BACKREF': <Recursion on defaultdict with id=1791930912832>, 'a': defaultdict(<function <lambda> at 0x000001A13760BE50>, {'BACKREF': <Recursion on defaultdict with id=1791930913232>, '$': None})})}), 'e': defaultdict(<function <lambda> at 0x000001A13760BE50>, {'BACKREF': <Recursion on defaultdict with id=1791930855312>, 'l': defaultdict(<function <lambda> at 0x000001A13760BE50>, {'BACKREF': <Recursion on defaultdict with id=1791930912912>, 'l': defaultdict(<function <lambda> at 0x000001A13760BE50>, {'BACKREF': <Recursion on defaultdict with id=1791930912992>, 'o': defaultdict(<function <lambda> at 0x000001A13760BE50>, {'BACKREF': <Recursion on defaultdict with id=1791930913072>, '$': None})})}), 'y': defaultdict(<function <lambda> at 0x000001A13760BE50>, {'BACKREF': <Recursion on defaultdict with id=1791930912912>, '$': None})})})})