AttributeError: format not found – pyodide + joblib.dump + scikit-learn (TfidfVectorizer)

Tags: , ,



I have pickled a SMS spam prediction model using pickle. Now, I want to use Pyodide to load the model in the browser.

I have loaded the pickled file using pickle.loads in the browser:

console.log("Pyodide loaded, downloading pretrained ML model...")
const model = (await blobToBase64(await (await fetch("/model.pkl")).blob())).replace("data:application/octet-stream;base64,", "")
console.log("Loading model into Pyodide...")
await pyodide.loadPackage("scikit-learn")
await pyodide.loadPackage("joblib")
pyodide.runPython(`
    import base64
    import pickle
    from io import BytesIO
    classifier, vectorizer = pickle.loads(base64.b64decode('${model}'))
`)

This works.

But, when I try to call:

const prediction = pyodide.runPython(`
    vectorized_message = vectorizer.transform(["Call +172949 if you want to get $1000 immediately!!!!"])
    classifier.predict(vectorized_message)[0]
`)

It gives an error(in vectorizer.transform): AttributeError: format not found

Full error dump is below.

Uncaught (in promise) Error: Traceback (most recent call last):
  File "/lib/python3.8/site-packages/pyodide/_base.py", line 70, in eval_code
    eval(compile(mod, "<exec>", mode="exec"), ns, ns)
  File "<exec>", line 2, in <module>
  File "/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1899, in transform
    return self._tfidf.transform(X, copy=False)
  File "/lib/python3.8/site-packages/sklearn/feature_extraction/text.py", line 1513, in transform
    X = X * self._idf_diag
  File "/lib/python3.8/site-packages/scipy/sparse/base.py", line 319, in __mul__
    return self._mul_sparse_matrix(other)
  File "/lib/python3.8/site-packages/scipy/sparse/compressed.py", line 478, in _mul_sparse_matrix
    other = self.__class__(other)  # convert to this format
  File "/lib/python3.8/site-packages/scipy/sparse/compressed.py", line 28, in __init__
    if arg1.format == self.format and copy:
  File "/lib/python3.8/site-packages/scipy/sparse/base.py", line 525, in __getattr__
    raise AttributeError(attr + " not found")
AttributeError: format not found

    _hiwire_throw_error https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
    __runPython https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
    _runPythonInternal https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
    runPython https://cdn.jsdelivr.net/pyodide/v0.16.1/full/pyodide.asm.js:8
    <anonymous> http://localhost/:41
    async* http://localhost/:46
pyodide.asm.js:8:39788

In Python it works fine though.

What I might be doing wrong?

Answer

It’s likely a pickle portability issue. Pickles should be portable between architectures┬╣, here amd64 and wasm32 however they are not portable across package versions. This means that package versions should be identical between the environement where you train your model and where you do the inference (pyodide).

pyodide 0.16.1 includes Python 3.8.2, scipy 0.17.1 and scikit-learn 0.22.2. Which unfortunately means that you will have to build that version of scipy (and possibly numpy) from sources to train the model, since a Python 3.8 binary wheel doesn’t exist for such an outdated version of scipy. In the future this should be resolved with pyodide#1293.

The particular error you are getting is likely due to scipy.sparse version mimatch see scipy#6533

┬╣Though, tree based models in scikit-learn at present are not portable across architectures, and so won’t unpickle in pyodide. This is known bug that should be fixed (scikit-learn#19602)



Source: stackoverflow