I have pickled a SMS spam prediction model using pickle. Now, I want to use Pyodide to load the model in the browser.
I have loaded the pickled file using pickle.loads in the browser:
console.log("Pyodide loaded, downloading pretrained ML model...") const model = (await blobToBase64(await (await fetch("/model.pkl")).blob())).replace("data:application/octet-stream;base64,", "") console.log("Loading model into Pyodide...") await pyodide.loadPackage("scikit-learn") await pyodide.loadPackage("joblib") pyodide.runPython(` import base64 import pickle from io import BytesIO classifier, vectorizer = pickle.loads(base64.b64decode('${model}')) `)
This works.
But, when I try to call:
const prediction = pyodide.runPython(` vectorized_message = vectorizer.transform(["Call +172949 if you want to get $1000 immediately!!!!"]) classifier.predict(vectorized_message)[0] `)
It gives an error(in vectorizer.transform): AttributeError: format not found
Full error dump is below.
Uncaught (in promise) Error: Traceback (most recent call last): File "/lib/python3.8/site-packages/pyodide/_base.py", line 70, in eval_code eval(compile(mod, "<exec>", mode="exec"), ns, ns) File "<exec>", line 2, in <module> File "/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1899, in transform return self._tfidf.transform(X, copy=False) File "/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1513, in transform X = X * self._idf_diag File "/lib/python3.8/site-packages/scipy/sparse/base.py", line 319, in __mul__ return self._mul_sparse_matrix(other) File "/lib/python3.8/site-packages/scipy/sparse/compressed.py", line 478, in _mul_sparse_matrix other = self.__class__(other) # convert to this format File "/lib/python3.8/site-packages/scipy/sparse/compressed.py", line 28, in __init__ if arg1.format == self.format and copy: File "/lib/python3.8/site-packages/scipy/sparse/base.py", line 525, in __getattr__ raise AttributeError(attr + " not found") AttributeError: format not found _hiwire_throw_error https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8 __runPython https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8 _runPythonInternal https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8 runPython https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8 <anonymous> http://localhost/:41 async* http://localhost/:46 pyodide.asm.js:8:39788
In Python it works fine though.
What I might be doing wrong?
Advertisement
Answer
It’s likely a pickle portability issue. Pickles should be portable between architectures¹, here amd64
and wasm32
however they are not portable across package versions. This means that package versions should be identical between the environement where you train your model and where you do the inference (pyodide).
pyodide 0.16.1 includes Python 3.8.2, scipy 0.17.1 and scikit-learn 0.22.2. Which unfortunately means that you will have to build that version of scipy (and possibly numpy) from sources to train the model, since a Python 3.8 binary wheel doesn’t exist for such an outdated version of scipy. In the future this should be resolved with pyodide#1293.
The particular error you are getting is likely due to scipy.sparse
version mimatch see scipy#6533
¹Though, tree based models in scikit-learn at present are not portable across architectures, and so won’t unpickle in pyodide. This is known bug that should be fixed (scikit-learn#19602)