I have recently begun working with dill. I have a metaclass, which I use to create a Singleton pattern. Always have one object at any instant. I am using dill to serialise.The problem is once the object is loaded back, it doesn’t respect the Singleton pattern (enforced by metaclass) and __init__
gets called.
Here is the code which can reproduce the issue
import os.path import dill class SingletonBase(type): _instances = {} def __call__(cls, *args, **kwargs): if (cls not in cls._instances): cls._instances[cls] = super(SingletonBase, cls).__call__(*args, **kwargs) return cls._instances[cls] class TestClass(metaclass=SingletonBase) : def __init__(self): self.testatrr = "hello" def set_method(self): self.testatrr = "hi" def get_method(self): print(self.testatrr) if os.path.isfile("statefile.dill"): with open("statefile.dill", 'rb') as statehandle: tobj = dill.load(statehandle) else: tobj=TestClass() tobj.set_method() tobj=TestClass() # init Shouldn't get called tobj.get_method() with open("statefile.dill", 'wb') as statehandle: dill.dump(tobj, statehandle)
On the first run __init__
is called only once. So tobj.get_method()
would print “hi”. But in the second run when tobj is loaded from dill, called to TestClass()
triggers __init__
. Is there anyway to fix this ? To get dill incorporate the metaclass ?
I understand Singleton like thing is really not needed in Python. But I have gone too far now with thousands of line of code. Hoping to find a way out without a rewrite. Would really appreciate your help.
Advertisement
Answer
So, first of all: when serializing an ordinary method, upon unserializing it (via pickle or dill.load) its initialization ordinary mechanism, that is, calling its __init__
, will not be run. That is the desired outcome: you want the object’s previous state, and not trigger any initalization side-effects
When unserializing a class with a metaclass with dill, obviously, the same outcome is desired: so dill will NOT run the metaclass’s __call__
, as that will trigger initialization side-effects.
The problem lies that in this problematic arrangement for singletons, what guarantees the “singleton” is exactly a side effect of the class instantiation. Not of the class creation which would mean moving the duplicate verification test to the metaclass __init__
, but when the already created class is instantiated – that is when the metaclass’ __call__
is run. This call is correctly skipped by dill in the tobj = dill.load(statehandle)
line.
So, when you try to create a new instance of TestClass
bellow, the _instances
registry is empty, and a new instance is created.
Now – that is what would take place with an ordinary “pickle”, not with “dill” (see bellow).
Going back to your singleton: you have to keep in mind that at some point the singleton is actually created for the first time in the running process. When unpickling an object of meant to behave as a singleton, if it can detect an instance already exists when being instantiated, it can reuse that.
However, unpickling will skip the
normal instantiation through the metaclass __call__
and run the class’ __new__
directly. So the class __new__
must be aware of the singleton
mechanism. Which means that regardless of the metaclass, a base class with a __new__
method is needed. Since we want to avoid re-running __init__
, we need the metaclass __call__
, otherwise, Python will call __init__
upon ordinary (non-unpickling) de-serialization. The baseclass __new__
has thus to collaborate with the metaclass __call__
in using the cache mechanism.
After creating an instance by calling __new__
, ordinary unpickling, after calling __new__
on the class will restore the instance state by updating its namespace which is exposed in the __dict__
attribute.
With a colaborative metaclass __call__
and baseclass __new__
in place, the serialized singleton works with ordinary pickle:
import os.path import pickle class SingletonMeta(type): _instances = {} def __call__(cls, *args, **kwargs): mcls = type(cls) if cls not in mcls._instances: # all that type.__call__ does is call the cls' __new__ and its __init__ in sequence: instance = cls.__new__(cls, *args, **kwargs) instance.__init__(*args, **kwargs) else: instance = mcls._instances[cls] return instance class SingletonBase(metaclass=SingletonMeta): def __new__(cls, *args, **kwargs): mcls = type(cls) instance = mcls._instances.get(cls) if not instance: instance = mcls._instances[cls] = super().__new__(cls, *args, **kwargs) return instance class TestClass(SingletonBase) : def __init__(self): print("at init") self.testatrr = "init run" def set_method(self): self.testatrr = "set run" def get_method(self): print(self.testatrr) if os.path.isfile("statefile.pickle"): with open("statefile.pickle", 'rb') as statehandle: print("unpickling") tobj = pickle.load(statehandle) else: tobj=TestClass() tobj.set_method() tobj=TestClass() # init Shouldn't get called tobj.get_method() with open("statefile.pickle", 'wb') as statehandle: pickle.dump(tobj, statehandle)
using dill
The fact that dill, by default, actually serializes each objects class itself (and it does that by putting the actual class source code in the serialized file, and re-executes that on deserializng), complicate things by another order of magnitude.
What happens is related to the “singleton” behavior in Python: as I wrote in the comment, complicating the pattern is not healthy because when you bind a variable at module level (formally called “global” variable, but it is different than “global” in other languages, as it is scoped to the module), you already have a “singleton”. And the language does use thius behavior all the time: if you think about it, any class or function in Python is already a “singleton”.
No special mechanism is needed to guarantee classes and functions are singletons: they are instantiated by the fact they are created by the def
and class
statements in the module, which are executed exactly once. (if you look around stackoverflow, you will see people getting weird errors in Python if they manage to import the same module twice, by misusing the import mechanism, though)
And now, surprise: there is one more thing that breakes the “singletonenness” of classes: dill
deserialization itself! on loading
a file, it does execute a class body again – it is the only mechanism possible to make a class available in a project where its code is not present
(which is dill’s proposal).
If you do not need dill to actually serialize the classes, and had
got to dill instead of pickle just to be able to serialize the
singleton, you can either use pickle, or call dill.dump
with
the byref=True
optional argument: this will avoid serializing
classes themselves, and the code above will work. Otherwise,
this id the second time ever I had needed a second-order
metaclass, in order to avoid dill’s class duplicity:
import os.path import dill import sys class SingletonMetaMeta(type): def __new__(mcls, name, bases, namespace, **kw): mod = sys.modules[namespace["__module__"]] if inprocess_metaclass := getattr(mod, name, None): return inprocess_metaclass return super().__new__(mcls, name, bases, namespace, **kw) def getkey(cls): return f"{cls.__module__}.{cls.__qualname__}" class SingletonMeta(type, metaclass=SingletonMetaMeta): _instances = {} def __call__(cls, *args, **kwargs): # The metaclass __call__ is actually the only way of preventing '__init__' to be run # for new instantiations. # for ordinay usage of singletons that do not need to preserve state across # serialization/deserialization, the approach of creating a single instance # of an ordinary class would work. mcls = type(cls) if getkey(cls) not in mcls._instances: # all that type.__call__ does is call the cls' __new__ and its __init__ in sequence. instance = cls.__new__(cls, *args, **kwargs) # the pickling protocol ordianrily won't run this __call__ method, so we #can always call __init__ instance.__init__(*args, **kwargs) else: instance = mcls._instances[getkey(cls)] return instance class SingletonBase(metaclass=SingletonMeta): def __new__(cls, *args, **kwargs): # check if an instance exists at the metaclss. # the pickling protocol calls this __new__ in a # standalone way, in order to avoid re-running # the class "__init__". It does not rely on # the metaclass __call__ which normal instantiation does # because that would always run __new__ and __init__ # due to the singleton being possibly created in two ways: # called from code, or unserialized, we replicate the instantiate and cache bit: mcls = type(cls) instance = mcls._instances.get(getkey(cls)) if not instance: instance = mcls._instances[getkey(cls)] = super().__new__(cls, *args, **kwargs) return instance class TestClass(SingletonBase) : def __init__(self): print("at init") self.testatrr = "init run" def set_method(self): self.testatrr = "set run" def get_method(self): print(self.testatrr) if os.path.isfile("statefile.dill"): with open("statefile.dill", 'rb') as statehandle: print("unpickling") tobj = dill.load(statehandle) else: tobj=TestClass() tobj.set_method() tobj=TestClass() # init Shouldn't get called tobj.get_method() with open("statefile.dill", 'wb') as statehandle: dill.dump(tobj, statehandle)
Possible extra issues:
If your singletons do take extra arguments on its
__init__
, those will show up in the__new__
method as well, but should not be forwarded toobject.__new__
. Simply dosuper().__new__(cls)
on the Base class__new__
.You mentioned an existing codebase where you did not want to replace the singleton mechanism. If this means you can’t insert the baseclass in these snippets as well by ordinary means, then the
__new__
method on the metaclass should be written to include a__new__
method on the singletons (either by inserting the current base class as a mixin, or by injecting a__new__
method in it). In that case, please ask a follow up question, and add the “metaclass” tag, I should see it later.