UPDATED based on Lennart Regebro’s answer
Suppose you iterate through a dictionary, and sometimes need to delete an element. The following is very efficient:
remove = [] for k, v in dict_.items(): if condition(k, v): remove.append(k) continue # do other things you need to do in this loop for k in remove: del dict_[k]
The only overhead here is building the list of keys to remove; unless it grows large compared to the dictionary size, it’s not an issue. However, this approach requires some extra coding, so it’s not very popular.
The popular dict comprehension approach:
dict_ = {k : v for k, v in dict_ if not condition(k, v)} for k, v in dict_.items(): # do other things you need to do in this loop
results in a full dictionary copy, and so has the risk of a silly performance hit if dictionaries grow large or the containing function is called often.
A much better approach is to copy the keys only rather than whole dictionary:
for k in list(dict_.keys()): if condition(k, dict_[k]): del dict_[k] continue # do other things you need to do in this loop
(Note that all code examples are in Python 3, so keys()
, items()
returns a view, not a copy.)
In most cases, it won’t hurt performance that much, since the time to check even the simplest condition (not to mention other stuff you’re doing in the loop) is usually greater than the time to add one key to a list.
Still, I am wondering if it’s possible to avoid even that with a custom dictionary that allows deletions while iterating:
for k, v in dict_.items(): if condition(k, v): del dict_[k] continue # do other things you need to do in this loop
Perhaps an iterator could always look ahead, so that when the __next__
is called, the iterator knows where to go without even looking at the current element (it would only need to look at the element when it first gets to it). And if there is no next element, the iterator could just set the flag that would cause StopIteration
exception raised whenever __next__
is called again.
If the element the iterator tries to advance to turns out to be deleted, it’s fine to raise an exception; there is no need to support deletions while multiple iterations are going on simultaneously.
Are there any problems with this approach?
One problem is that I’m not sure it can be done with no material overhead compared to the existing dict
; otherwise, it would be faster to use the list(dict_)
approach!
UPDATE:
I tried all the versions. I don’t report the timing, since they are clearly very dependent on the exact situation. But it seems safe to say that in many cases, the fastest approach is likely to be list(dict_)
. After all, if you think about, the copy is the fastest operation that grows linearly with size of the list; almost any other overhead, as long as it’s also proportional to the list size, is likely to be bigger.
I really like all the ideas, but since I have to select only one, I’m accepting the context manager solution since it allows to use the dictionary as either normal or “enhanced” with very small code changes.
Advertisement
Answer
As you note, you can store the items to delete somewhere and defer the deletion of them until later. The problem then becomes when to purge them and how to make sure that the purge method eventually gets called. The answer to this is a context manager which is also a subclass of dict
.
class dd_dict(dict): # the dd is for "deferred delete" _deletes = None def __delitem__(self, key): if key not in self: raise KeyError(str(key)) dict.__delitem__(self, key) if self._deletes is None else self._deletes.add(key) def __enter__(self): self._deletes = set() def __exit__(self, type, value, tb): for key in self._deletes: try: dict.__delitem__(self, key) except KeyError: pass self._deletes = None
Usage:
# make the dict and do whatever to it ddd = dd_dict(a=1, b=2, c=3) # now iterate over it, deferring deletes with ddd: for k, v in ddd.iteritems(): if k is "a": del ddd[k] print ddd # shows that "a" is still there print ddd # shows that "a" has been deleted
If you’re not in a with
block, of course, deletes are immediate; as this is a dict
subclass, it works just like a regular dict
outside of a context manager.
You could also implement this as a wrapper class for a dictionary:
class deferring_delete(object): def __init__(self, d): self._dict = d def __enter__(self): self._deletes = set() return self def __exit__(self, type, value, tb): for key in self._deletes: try: del self._dict[key] except KeyError: pass del self._deletes def __delitem__(self, key): if key not in self._dict: raise KeyError(str(key)) self._deletes.add(key) d = dict(a=1, b=2, c=3) with deferring_delete(d) as dd: for k, v in d.iteritems(): if k is "a": del dd[k] # delete through wrapper print d
It’s even possible to make the wrapper class fully functional as a dictionary, if you want, though that’s a fair bit more code.
Performance-wise, this is admittedly not such a win, but I like it from a programmer-friendliness standpoint. The second method should be very slightly faster since it’s not testing a flag on each delete.