I get the following error when I’m reading my .pkl files on spyder (python 3.6.5):
IN: with open(file, "rb") as f: data = pickle.load(f) Traceback (most recent call last): File "<ipython-input-5-d9796b902b88>", line 2, in <module> data = pickle.load(f) AttributeError: Can't get attribute 'Signal' on <module '__main__' from 'C:\Python36\lib\site-packages\spyder\utils\ipython\start_kernel.py'>
My program is made of one file:
In the program, a class
Signal is defined as well as many functions. A simplified overview of the program is provided below:
import numpy as np import _pickle as pickle import os # The unique class class Signal: def __init__(self, fq, t0, tf): self.fq = fq self.t0 = t0 self.tf = tf self.timeline = np.round(np.arange(t0, tf, 1/fq*1000), 3) # The functions def write_file(data, folder_path, file_name): with open(join(folder_path, file_name), "wb") as output: pickle.dump(data, output, -1) def read_file(folder_path, file_name): with open(join(folder_path, file_name), "rb") as input: data= pickle.load(input) return data def compute_data(# parameters): # do stuff
compute_data will return a list of tuples of the form:
data = [((Signal_1_1, Signal_1_2, ...), val 1), ((Signal_2_1, Signal_2_2, ...), val 2)...]
With, of course, the Signal_i_k being an object
Signal. This list will be saved in .pkl format. Moreover, I’m doing a lot of iteration with different parameters for the
compute_data functions. Many iterations will use past computed data as a starting point, and thus will read the corresponding and needed .pkl files.
Finally, I’m using several computers at the same time, each of them saving the computed data on the local network. Thus each computer can access the data generated by the others and use it as a starting point.
Back to the error:
My main issue is that I never have this error when I start my programs by double-clicking the file or by the windows cmd or PowerShell. The program never crashes throwing this error and runs without apparent issues.
However, I can not read a .pkl file in spyder. Every time I try, the error is thrown.
Any idea why I got this weird behavior?
When you dump stuff in a
pickle you should avoid pickling classes and functions declared in the main module. Your problem is (in part) because you only have one file in your program.
pickle is lazy and does not serialize class definitions or function definitions. Instead it saves a reference of how to find the class (the module it lives in and its name).
When python runs a script/file directly it runs the program as the
__main__ module (regardless of its actual file name). However, when a file is loaded and is not the main module (eg. when you do something like
import program) then its module name is based on its name. So
program.py gets called
When you are running from the command line you are doing the former, and the module is called
__main__. As such, pickle creates references to your classes like
spyder tries to load the pickle file it gets told to import
__main__ and look for
Signal. But, spyder’s
__main__ module is the module that is used to start
spyder and not your
program.py and so pickle fails to find
You can inspect the contents of a pickle file by running (
-a is prints a description of each command). From this you will see that your class is being referenced as
python -m pickletools -a file.pkl
And you’ll see something like:
0: x80 PROTO 3 Protocol version indicator. 2: c GLOBAL '__main__ Signal' Push a global object (module.attr) on the stack. 19: q BINPUT 0 Store the stack top into the memo. The stack is not popped. 21: ) EMPTY_TUPLE Push an empty tuple. 22: x81 NEWOBJ Build an object instance. 23: q BINPUT 1 Store the stack top into the memo. The stack is not popped. ... 51: b BUILD Finish building an object, via __setstate__ or dict update. 52: . STOP Stop the unpickling machine. highest protocol among opcodes = 2
There are a number of solutions available to you:
- Don’t serialise instances of classes that are defined in your
__main__module. The easiest and best solution. Instead move these classes to another module, or write a
main.pyscript to invoke your program (both will mean such classes are no longer found in the
- Write a custom derserialiser
- Write a custom serialiser
The following solutions will be working with a pickle file called
out.pkl created by the following code (in a file called
import pickle class MyClass: def __init__(self, name): self.name = name if __name__ == '__main__': o = MyClass('test') with open('out.pkl', 'wb') as f: pickle.dump(o, f)
The Custom Deserialiser Solution
You can write a customer deserialiser that knows when it encounters a reference to the
__main__ module what you really mean is the
import pickle class MyCustomUnpickler(pickle.Unpickler): def find_class(self, module, name): if module == "__main__": module = "program" return super().find_class(module, name) with open('out.pkl', 'rb') as f: unpickler = MyCustomUnpickler(f) obj = unpickler.load() print(obj) print(obj.name)
This is the easiest way to load pickle files that have already been created. The program is that it pushes the responsibility on to the deserialising code, when it should really be the responsibility of the serialising code to create pickle files correctly.
The Custom Serialisation Solution
In contrast to the previous solution you can make sure that serialised pickle objects can be deserialised easily by anyone without having to know the custom deserialisation logic. To do this you can use the
copyreg module to inform
pickle how to deserialise various classes. So here, what you would do is tell
pickle to deserialise all instances of
__main__ classes as if they were instances of
program classes. You will need to register a custom serialiser for each class
import program import pickle import copyreg class MyClass: def __init__(self, name): self.name = name def pickle_MyClass(obj): assert type(obj) is MyClass return program.MyClass, (obj.name,) copyreg.pickle(MyClass, pickle_MyClass) if __name__ == '__main__': o = MyClass('test') with open('out.pkl', 'wb') as f: pickle.dump(o, f)