Packaging Libraries with ML models in Python [closed]

Question

Closed. This question needs to be more focused. It is not currently accepting answers. Want to improve this question? Update the question so it focuses on one problem only by editing this post. Closed 2 years ago. Improve this question I have a saved model for Sentiment Analysis and code and data along with it. I am trying to create

Accepted Answer

You need to know about three things if you want to maintain such a library properly:how to build a packagehow to version a packagehow to distribute a packageThere is a few ways how you could do that, the most user-friendly at the moment is probably poetry, so I&#8217;ll use that as an example. It needs to be installed if you want to use this post as a tutorial.In order to have some very basic project skeleton to work with, I&#8217;ll just assume that you have something similar to this:modelpersister├───modelpersister│   ├───model.pkl│   ├───__init__.py│   ├───model_definition.py│   ├───train.py│   └───analyze.py└───pyproject.tomlmodel.pkl: the model artifact that you&#8217;re going to ship with your package__init__.py: empty, needs to be there to make this folder a python modulemodel_definition.py: contains the class definition and features that define your modeltrain.py: accepts data to train you model and overwrite the current model.pkl file with the result, something roughly like this:import picklefrom pathlib import Pathfrom modelpersister.model_definition import SentimentAnalyzer# overwrite the current model given some new datadef train(data):    model = SentimentAnalyzer.train(data)    with open(Path(__file__).parent / "model.pkl") as model_file:        pickle.dump(model, model_file)analyze.py: accepts data points to analyze them given the current model.pkl, something roughly like this:import pickleimport importlib.resourcesfrom modelpersister.model_definition import MyModel# load the current model as a package resource (small but important detail)with importlib.resources.path("modelpersister", "model.pkl") as model_file:    model: MyModel = pickle.load(model_file)# make meaningful analyzes available in this filedef estimate(data_point):    return model.estimate(data_point)pyproject.toml: a metadata file that poetry needs in order to package this code, something very similar to this:[tool.poetry]name = "modelpersister"version = "0.1.0"description = "Ship a sentiment analysis model."authors = ["Mishaal <my@mail.com>"]license = "MIT"  # a good default as far as licenses go[tool.poetry.dependencies]python = "^3.8"sklearn = "^0.23"  # or whichever ML library you used for your model definition[tool.poetry.dev-dependencies][build-system]requires = ["poetry>=0.12"]build-backend = "poetry.masonry.api"Given all of these files being filled with meaningful code and hopefully using a better name than modelpersister for the project, your workflow would look roughly like this:update your features in model_definition.py, train your model with train.py on better data, or add new functions in analysis.py until you feel like your model is now noticeably better than beforerun poetry version minor to update the package versionrun poetry build to build your code and model into a source distribution and wheel file that you can, if you want, perform some final tests onrun poetry publish to distribute your package &#8211; by default to the global Python package index, but you can also set up a private PyPI instance and tell poetry about it, or upload it manually somewhere else

Advertisement

Answer