I have a saved model for Sentiment Analysis and code and data along with it. I am trying to create a library that will have functionalities from this code and uses this trained model. I do not get how will I incorporate the model and functionalities dependent upon it.
Can anyone guide me on how to do that specifically?
Edit: Using pickle is the method I went with (answered below)
Advertisement
Answer
You need to know about three things if you want to maintain such a library properly:
- how to build a package
- how to version a package
- how to distribute a package
There is a few ways how you could do that, the most user-friendly at the moment is probably poetry
, so I’ll use that as an example. It needs to be installed if you want to use this post as a tutorial.
In order to have some very basic project skeleton to work with, I’ll just assume that you have something similar to this:
modelpersister ├───modelpersister │ ├───model.pkl │ ├───__init__.py │ ├───model_definition.py │ ├───train.py │ └───analyze.py └───pyproject.toml
model.pkl
: the model artifact that you’re going to ship with your package__init__.py
: empty, needs to be there to make this folder a python modulemodel_definition.py
: contains the class definition and features that define your modeltrain.py
: accepts data to train you model and overwrite the currentmodel.pkl
file with the result, something roughly like this:
import pickle from pathlib import Path from modelpersister.model_definition import SentimentAnalyzer # overwrite the current model given some new data def train(data): model = SentimentAnalyzer.train(data) with open(Path(__file__).parent / "model.pkl") as model_file: pickle.dump(model, model_file)
analyze.py
: accepts data points to analyze them given the currentmodel.pkl
, something roughly like this:
import pickle import importlib.resources from modelpersister.model_definition import MyModel # load the current model as a package resource (small but important detail) with importlib.resources.path("modelpersister", "model.pkl") as model_file: model: MyModel = pickle.load(model_file) # make meaningful analyzes available in this file def estimate(data_point): return model.estimate(data_point)
pyproject.toml
: a metadata file that poetry needs in order to package this code, something very similar to this:
[tool.poetry] name = "modelpersister" version = "0.1.0" description = "Ship a sentiment analysis model." authors = ["Mishaal <my@mail.com>"] license = "MIT" # a good default as far as licenses go [tool.poetry.dependencies] python = "^3.8" sklearn = "^0.23" # or whichever ML library you used for your model definition [tool.poetry.dev-dependencies] [build-system] requires = ["poetry>=0.12"] build-backend = "poetry.masonry.api"
Given all of these files being filled with meaningful code and hopefully using a better name than modelpersister
for the project, your workflow would look roughly like this:
- update your features in
model_definition.py
, train your model withtrain.py
on better data, or add new functions inanalysis.py
until you feel like your model is now noticeably better than before - run
poetry version minor
to update the package version - run
poetry build
to build your code and model into a source distribution and wheel file that you can, if you want, perform some final tests on - run
poetry publish
to distribute your package – by default to the global Python package index, but you can also set up a private PyPI instance and tellpoetry
about it, or upload it manually somewhere else