I have a Django REST API endpoint. It receives a JSON payload eg.
{ "data" : [0,1,2,3] }
This is decoded in a views.py function and generates a new database object like so (pseudo code):
newobj = MyObj.(col0 = 0, col1= 1, col2 = 2, col3 = 3) newobj.save()
In tests, it is 20x faster to create a list of x1000 newobjs, then do a bulk create:
Myobj.objects.bulk_create(newobjs, 1000)
So, the question is how to save individual POSTs somewhere in Django ready for batch writes when we have 1000 of them ?
Advertisement
Answer
Thanks for the above responses, the answers included some of what was suggested, but is a superset, so here’s a summary.
This is really about creating a FIFO. memcached turns out to be unsuitable (after trying) because only redis has a list function that enables this, explained nicely here.
Also note that the Django built in cache does not support the redis list api calls.
So we need a new docker-compose.yml entry to add redis:
redis: image: redis ports: - 6379:6379/tcp networks: - app-network
Then in views.py we add: (note the use of redis rpush)
import redis ... redis_host=os.environ['REDIS_HOST'] redis_port = 6379 redis_password = "" r = redis.StrictRedis(host=redis_host, port=redis_port, password=redis_password, decode_responses=True) ... def write_post_to_redis(request): payload = json.loads(request.body) r.rpush("df",json.dumps(payload))
So this pushes the received payload into the redis in-memory cache. We now need to read (or pop ) it and write to the postgres database. So we need a process that wakes up every n seconds and checks. For this we need Django background_task. First, install it with:
pipenv install django-background-tasks
And add to the installed apps of the settings.py
INSTALLED_APPS = [ ... 'background_task',
Then run a migrate to add the background task tables:
python manage.py migrate
Now in views.py, add:
from background_task import background from background_task.models import CompletedTask
And add the function to write the cached data to the postgres database, note the decorator which states it should run in the background every 5 seconds. Also note use of redis lpop.
@background(schedule=5) def write_cached_samples(): ... payload = json.loads(r.lpop('df')) # now do your write of payload to postgres ... and delete the completed tasks or we'll have a big db leak CompletedTask.objects.all().delete()
In order to start the process up, add the following to the base of urls.py:
write_cached_samples(repeat=10, repeat_until=None)
Finally, because the background task needs a separate process, we duplicate the django docker container in docker-compose.yml but replace the asgi server run command with the background process run command.
django_bg: image: my_django command: > sh -c "python manage.py process_tasks" ...
In summary we add two new docker containers, one for the redis in-memory cache, and one to run the django background tasks. We use the redis lists rpush and lpop functions to create a FIFO with the API receive pushing and a background task popping.
There was a small issue where nginx was hooking up to the wrong django container, rectified by stopping and restarting the background container, some issue where docker networking routing is wrongly initialising.
Next I am replacing the Django HTTP API endpoint with a Go one to see how much of a speed up we get, as the Daphne ASGI server is hitting max CPU at only 100 requests per sec.