I’m building a site that relies on the output of a machine learning algorithm. All that is needed for the user-facing part of the site is the output of the algorithm (class labels for a set of items), which can be easily stored and retrieved from the django
models. The algorithm could be run once a day, and does not rely on user input.
So this part of the site only depends on django
and related packages.
But developing, tuning, and evaluating the algorithm uses many other python packages such as scikit-learn
, pandas
, numpy
, matplotlib
, etc. It also requires saving many different sets of class labels.
These dependencies cause some issues when deploying to heroku
, because numpy
requires LAPACK/BLAS.
It also seems like it would be good practice to have as few dependencies as possible in the deployed app.
How can I separate the machine-learning part from the user-facing part, but, still have them integrated enough that the results of the algorithm are easily used?
I thought of creating two separate projects, and then writing to the user-facing database in some way, but that seems like it would lead to maintance problems (managing the dependencies, changes in database schemas etc).
As far as I understand, this problem is a little bit different than using different settings or databases for production and development, because it is more about managing different sets of dependencies.
Advertisement
Answer
Just move what we discussed to the answer in case people have the same question, my suggestion is:
Spend some time define what are the dependencies for your site and for the algorithm code.
Dump the dependency list into
requirements.txt
for each project.Deploy them on different environments so the conflicts don’t happen.
Develop some API endpoints on your site side using Django Rest Framework or Tastypie and let your algorithm code update your model using the API. Use
cron
to run your algorithm code regularly and push the data.