Skip to content
Advertisement

Dill doesn’t seem to respect metaclass

I have recently begun working with dill. I have a metaclass, which I use to create a Singleton pattern. Always have one object at any instant. I am using dill to serialise.The problem is once the object is loaded back, it doesn’t respect the Singleton pattern (enforced by metaclass) and __init__ gets called.

Here is the code which can reproduce the issue

import os.path
import dill

class SingletonBase(type):
    _instances = {}

    def __call__(cls, *args, **kwargs):
        if (cls not in cls._instances):
            cls._instances[cls] = super(SingletonBase, cls).__call__(*args, **kwargs)
        return cls._instances[cls]



class TestClass(metaclass=SingletonBase) :
    def __init__(self):
        self.testatrr = "hello"

    def set_method(self):
        self.testatrr = "hi"

    def get_method(self):
        print(self.testatrr)


if os.path.isfile("statefile.dill"):
    with open("statefile.dill", 'rb') as statehandle:
        tobj = dill.load(statehandle)
else:
    tobj=TestClass()


tobj.set_method()
tobj=TestClass()  # init Shouldn't get called
tobj.get_method()

with open("statefile.dill", 'wb') as statehandle:
    dill.dump(tobj, statehandle)

On the first run __init__ is called only once. So tobj.get_method() would print “hi”. But in the second run when tobj is loaded from dill, called to TestClass() triggers __init__. Is there anyway to fix this ? To get dill incorporate the metaclass ?

I understand Singleton like thing is really not needed in Python. But I have gone too far now with thousands of line of code. Hoping to find a way out without a rewrite. Would really appreciate your help.

Advertisement

Answer

So, first of all: when serializing an ordinary method, upon unserializing it (via pickle or dill.load) its initialization ordinary mechanism, that is, calling its __init__, will not be run. That is the desired outcome: you want the object’s previous state, and not trigger any initalization side-effects

When unserializing a class with a metaclass with dill, obviously, the same outcome is desired: so dill will NOT run the metaclass’s __call__, as that will trigger initialization side-effects.

The problem lies that in this problematic arrangement for singletons, what guarantees the “singleton” is exactly a side effect of the class instantiation. Not of the class creation which would mean moving the duplicate verification test to the metaclass __init__, but when the already created class is instantiated – that is when the metaclass’ __call__ is run. This call is correctly skipped by dill in the tobj = dill.load(statehandle) line.

So, when you try to create a new instance of TestClass bellow, the _instances registry is empty, and a new instance is created.

Now – that is what would take place with an ordinary “pickle”, not with “dill” (see bellow).

Going back to your singleton: you have to keep in mind that at some point the singleton is actually created for the first time in the running process. When unpickling an object of meant to behave as a singleton, if it can detect an instance already exists when being instantiated, it can reuse that.

However, unpickling will skip the normal instantiation through the metaclass __call__ and run the class’ __new__ directly. So the class __new__ must be aware of the singleton mechanism. Which means that regardless of the metaclass, a base class with a __new__ method is needed. Since we want to avoid re-running __init__, we need the metaclass __call__, otherwise, Python will call __init__ upon ordinary (non-unpickling) de-serialization. The baseclass __new__ has thus to collaborate with the metaclass __call__ in using the cache mechanism.

After creating an instance by calling __new__, ordinary unpickling, after calling __new__ on the class will restore the instance state by updating its namespace which is exposed in the __dict__ attribute.

With a colaborative metaclass __call__ and baseclass __new__ in place, the serialized singleton works with ordinary pickle:

import os.path

import pickle

class SingletonMeta(type):
    _instances = {}

    def __call__(cls, *args, **kwargs):

        mcls = type(cls)
        if cls not in mcls._instances:
            # all that type.__call__ does is call the cls' __new__ and its __init__ in sequence:
            instance = cls.__new__(cls, *args, **kwargs)
            instance.__init__(*args, **kwargs)
        else:
            instance = mcls._instances[cls]
        return instance

class SingletonBase(metaclass=SingletonMeta):

    def __new__(cls, *args, **kwargs):
        mcls = type(cls)
        instance = mcls._instances.get(cls)
        if not instance:
            instance = mcls._instances[cls] = super().__new__(cls, *args, **kwargs)

        return instance


class TestClass(SingletonBase) :
    def __init__(self):
        print("at init")
        self.testatrr = "init run"

    def set_method(self):
        self.testatrr = "set run"

    def get_method(self):
        print(self.testatrr)

if os.path.isfile("statefile.pickle"):
    with open("statefile.pickle", 'rb') as statehandle:
        print("unpickling")
        tobj = pickle.load(statehandle)
else:
    tobj=TestClass()

tobj.set_method()
tobj=TestClass()  # init Shouldn't get called
tobj.get_method()

with open("statefile.pickle", 'wb') as statehandle:
    pickle.dump(tobj, statehandle)

using dill

The fact that dill, by default, actually serializes each objects class itself (and it does that by putting the actual class source code in the serialized file, and re-executes that on deserializng), complicate things by another order of magnitude.

What happens is related to the “singleton” behavior in Python: as I wrote in the comment, complicating the pattern is not healthy because when you bind a variable at module level (formally called “global” variable, but it is different than “global” in other languages, as it is scoped to the module), you already have a “singleton”. And the language does use thius behavior all the time: if you think about it, any class or function in Python is already a “singleton”.

No special mechanism is needed to guarantee classes and functions are singletons: they are instantiated by the fact they are created by the def and class statements in the module, which are executed exactly once. (if you look around stackoverflow, you will see people getting weird errors in Python if they manage to import the same module twice, by misusing the import mechanism, though)

And now, surprise: there is one more thing that breakes the “singletonenness” of classes: dill deserialization itself! on loading a file, it does execute a class body again – it is the only mechanism possible to make a class available in a project where its code is not present (which is dill’s proposal).

If you do not need dill to actually serialize the classes, and had got to dill instead of pickle just to be able to serialize the singleton, you can either use pickle, or call dill.dump with the byref=True optional argument: this will avoid serializing classes themselves, and the code above will work. Otherwise, this id the second time ever I had needed a second-order metaclass, in order to avoid dill’s class duplicity:

import os.path
import dill


import sys

class SingletonMetaMeta(type):
    def __new__(mcls, name, bases, namespace, **kw):
        mod = sys.modules[namespace["__module__"]]
        if inprocess_metaclass := getattr(mod, name, None):
            return inprocess_metaclass
        return super().__new__(mcls, name, bases, namespace, **kw)


def getkey(cls):
    return f"{cls.__module__}.{cls.__qualname__}"


class SingletonMeta(type, metaclass=SingletonMetaMeta):
    _instances = {}

    def __call__(cls, *args, **kwargs):
        # The metaclass __call__ is actually the only way of preventing '__init__' to be run
        # for new instantiations.
        # for ordinay usage of singletons that do not need to preserve state across
        # serialization/deserialization, the approach of creating a single instance
        # of an ordinary class would work.
        mcls = type(cls)
        if getkey(cls) not in mcls._instances:
            # all that type.__call__ does is call the cls' __new__ and its __init__ in sequence.
            instance = cls.__new__(cls, *args, **kwargs)
            # the pickling protocol ordianrily won't run this __call__ method, so we
            #can always call __init__
            instance.__init__(*args, **kwargs)
        else:
            instance = mcls._instances[getkey(cls)]
        return instance

class SingletonBase(metaclass=SingletonMeta):

    def __new__(cls, *args, **kwargs):
        # check if an instance exists at the metaclss.
        # the pickling protocol calls this __new__ in  a
        # standalone way, in order to avoid re-running
        # the class "__init__". It does not rely on
        # the metaclass __call__ which normal instantiation does
        # because that would always run __new__ and __init__

        # due to the singleton being possibly created in two ways:
        # called from code, or unserialized, we replicate the instantiate and cache bit:
        mcls = type(cls)
        instance = mcls._instances.get(getkey(cls))
        if not instance:
            instance = mcls._instances[getkey(cls)] = super().__new__(cls, *args, **kwargs)

        return instance


class TestClass(SingletonBase) :
    def __init__(self):
        print("at init")
        self.testatrr = "init run"

    def set_method(self):
        self.testatrr = "set run"

    def get_method(self):
        print(self.testatrr)

if os.path.isfile("statefile.dill"):
    with open("statefile.dill", 'rb') as statehandle:
        print("unpickling")
        tobj = dill.load(statehandle)
else:
    tobj=TestClass()

tobj.set_method()
tobj=TestClass()  # init Shouldn't get called
tobj.get_method()

with open("statefile.dill", 'wb') as statehandle:
    dill.dump(tobj, statehandle)

Possible extra issues:

  1. If your singletons do take extra arguments on its __init__, those will show up in the __new__ method as well, but should not be forwarded to object.__new__. Simply do super().__new__(cls) on the Base class __new__.

  2. You mentioned an existing codebase where you did not want to replace the singleton mechanism. If this means you can’t insert the baseclass in these snippets as well by ordinary means, then the __new__ method on the metaclass should be written to include a __new__ method on the singletons (either by inserting the current base class as a mixin, or by injecting a __new__ method in it). In that case, please ask a follow up question, and add the “metaclass” tag, I should see it later.

1 People found this is helpful
Advertisement