Skip to content
Advertisement

AttributeError: format not found – pyodide + joblib.dump + scikit-learn (TfidfVectorizer)

I have pickled a SMS spam prediction model using pickle. Now, I want to use Pyodide to load the model in the browser.

I have loaded the pickled file using pickle.loads in the browser:

JavaScript

This works.

But, when I try to call:

JavaScript

It gives an error(in vectorizer.transform): AttributeError: format not found

Full error dump is below.

JavaScript

In Python it works fine though.

What I might be doing wrong?

Advertisement

Answer

It’s likely a pickle portability issue. Pickles should be portable between architectures¹, here amd64 and wasm32 however they are not portable across package versions. This means that package versions should be identical between the environement where you train your model and where you do the inference (pyodide).

pyodide 0.16.1 includes Python 3.8.2, scipy 0.17.1 and scikit-learn 0.22.2. Which unfortunately means that you will have to build that version of scipy (and possibly numpy) from sources to train the model, since a Python 3.8 binary wheel doesn’t exist for such an outdated version of scipy. In the future this should be resolved with pyodide#1293.

The particular error you are getting is likely due to scipy.sparse version mimatch see scipy#6533

¹Though, tree based models in scikit-learn at present are not portable across architectures, and so won’t unpickle in pyodide. This is known bug that should be fixed (scikit-learn#19602)

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement