Skip to content
Advertisement

subclassing dict; dict.update returns incorrrect value – python bug?

I needed to make a class that extended dict and ran into an interesting problem illustrated by the dumb example in the image below.

subclassing_dict_problem

Why is d.update() ignoring the class’s __getitem__?

EDIT: This is in python2.7 which does not appear to contain collections.UserDict Thinking UserDict.UserDict is the equivalent I tried this, and it gets closer, but still behaves interestingly.

updated to use UserDict

Advertisement

Answer

This is an example of the open-closed-principle (the class is open for extension but closed for modification). It is good thing to have because it allows subclassers to extend or override a method without unintentionally triggering behavior changes in others and without breaking the classes’s invariants.

We even do this in pure python code as well; for example, inside the pure python ordered dict code, the class local call from __init__() to update() is done using name mangling. This allows a subclasser to override update() without accidentally breaking __init__().

Sometimes, this is inconvenient. It means that a subclasser has to override every method whose behavior they want to change including get(), update(), and others. However, there are offsetting benefits (protection of internal invariants, preventing implementation details from leaking from the abstraction, and allowing users to assume the methods are independent of one another).

This style (chosen by Guido from the outset) is the default for the builtin types (otherwise we would forever be fighting segfaulting invariant violations) and for some pure python classes.

We do document when there is a departure from the default. For example, the cmd module uses the framework design pattern, letting the user define various do_action() methods. Also, some of the http modules do the same, specifically documenting that a user’s do_GET() method is called and that is how you attach customized HTTP event handlers.

In the absence of specifically documented method hooks (i.e. those listed above or methods like dict.__missing__(), a subclasser should presume method independence. Otherwise, how are you to know whether __getitem__() calls get() under the hood or vice-versa?

FWIW, this isn’t unique to Python. It comes up quite a bit in object oriented programming. Correctly designed classes either document root methods that affect the behavior of other methods or they are presumed to be independent.

There may need to be a FAQ for this, but nothing is broken or wrong here (other than Python having way too many dict variants to chose from). If someone mistakenly assumes or believes that __getitem__() must be called by the other accessor methods, they find out very quickly that assumption is wrong (that is if they run even minimal tests on the code).

Advertisement