Skip to content
Advertisement

Dill doesn’t seem to respect metaclass

I have recently begun working with dill. I have a metaclass, which I use to create a Singleton pattern. Always have one object at any instant. I am using dill to serialise.The problem is once the object is loaded back, it doesn’t respect the Singleton pattern (enforced by metaclass) and __init__ gets called.

Here is the code which can reproduce the issue

JavaScript

On the first run __init__ is called only once. So tobj.get_method() would print “hi”. But in the second run when tobj is loaded from dill, called to TestClass() triggers __init__. Is there anyway to fix this ? To get dill incorporate the metaclass ?

I understand Singleton like thing is really not needed in Python. But I have gone too far now with thousands of line of code. Hoping to find a way out without a rewrite. Would really appreciate your help.

Advertisement

Answer

So, first of all: when serializing an ordinary method, upon unserializing it (via pickle or dill.load) its initialization ordinary mechanism, that is, calling its __init__, will not be run. That is the desired outcome: you want the object’s previous state, and not trigger any initalization side-effects

When unserializing a class with a metaclass with dill, obviously, the same outcome is desired: so dill will NOT run the metaclass’s __call__, as that will trigger initialization side-effects.

The problem lies that in this problematic arrangement for singletons, what guarantees the “singleton” is exactly a side effect of the class instantiation. Not of the class creation which would mean moving the duplicate verification test to the metaclass __init__, but when the already created class is instantiated – that is when the metaclass’ __call__ is run. This call is correctly skipped by dill in the tobj = dill.load(statehandle) line.

So, when you try to create a new instance of TestClass bellow, the _instances registry is empty, and a new instance is created.

Now – that is what would take place with an ordinary “pickle”, not with “dill” (see bellow).

Going back to your singleton: you have to keep in mind that at some point the singleton is actually created for the first time in the running process. When unpickling an object of meant to behave as a singleton, if it can detect an instance already exists when being instantiated, it can reuse that.

However, unpickling will skip the normal instantiation through the metaclass __call__ and run the class’ __new__ directly. So the class __new__ must be aware of the singleton mechanism. Which means that regardless of the metaclass, a base class with a __new__ method is needed. Since we want to avoid re-running __init__, we need the metaclass __call__, otherwise, Python will call __init__ upon ordinary (non-unpickling) de-serialization. The baseclass __new__ has thus to collaborate with the metaclass __call__ in using the cache mechanism.

After creating an instance by calling __new__, ordinary unpickling, after calling __new__ on the class will restore the instance state by updating its namespace which is exposed in the __dict__ attribute.

With a colaborative metaclass __call__ and baseclass __new__ in place, the serialized singleton works with ordinary pickle:

JavaScript

using dill

The fact that dill, by default, actually serializes each objects class itself (and it does that by putting the actual class source code in the serialized file, and re-executes that on deserializng), complicate things by another order of magnitude.

What happens is related to the “singleton” behavior in Python: as I wrote in the comment, complicating the pattern is not healthy because when you bind a variable at module level (formally called “global” variable, but it is different than “global” in other languages, as it is scoped to the module), you already have a “singleton”. And the language does use thius behavior all the time: if you think about it, any class or function in Python is already a “singleton”.

No special mechanism is needed to guarantee classes and functions are singletons: they are instantiated by the fact they are created by the def and class statements in the module, which are executed exactly once. (if you look around stackoverflow, you will see people getting weird errors in Python if they manage to import the same module twice, by misusing the import mechanism, though)

And now, surprise: there is one more thing that breakes the “singletonenness” of classes: dill deserialization itself! on loading a file, it does execute a class body again – it is the only mechanism possible to make a class available in a project where its code is not present (which is dill’s proposal).

If you do not need dill to actually serialize the classes, and had got to dill instead of pickle just to be able to serialize the singleton, you can either use pickle, or call dill.dump with the byref=True optional argument: this will avoid serializing classes themselves, and the code above will work. Otherwise, this id the second time ever I had needed a second-order metaclass, in order to avoid dill’s class duplicity:

JavaScript

Possible extra issues:

  1. If your singletons do take extra arguments on its __init__, those will show up in the __new__ method as well, but should not be forwarded to object.__new__. Simply do super().__new__(cls) on the Base class __new__.

  2. You mentioned an existing codebase where you did not want to replace the singleton mechanism. If this means you can’t insert the baseclass in these snippets as well by ordinary means, then the __new__ method on the metaclass should be written to include a __new__ method on the singletons (either by inserting the current base class as a mixin, or by injecting a __new__ method in it). In that case, please ask a follow up question, and add the “metaclass” tag, I should see it later.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement