How to cache individual Django REST API POSTs for bulk_create?

Tags: , , , ,



I have a Django REST API endpoint. It receives a JSON payload eg.

{ "data" : [0,1,2,3] }

This is decoded in a views.py function and generates a new database object like so (pseudo code):

newobj = MyObj.(col0 = 0, col1= 1, col2 = 2, col3 = 3)
newobj.save()

In tests, it is 20x faster to create a list of x1000 newobjs, then do a bulk create:

Myobj.objects.bulk_create(newobjs, 1000)

So, the question is how to save individual POSTs somewhere in Django ready for batch writes when we have 1000 of them ?

Answer

Thanks for the above responses, the answers included some of what was suggested, but is a superset, so here’s a summary.

This is really about creating a FIFO. memcached turns out to be unsuitable (after trying) because only redis has a list function that enables this, explained nicely here.

Also note that the Django built in cache does not support the redis list api calls.

So we need a new docker-compose.yml entry to add redis:

  redis:
    image: redis
    ports:
      - 6379:6379/tcp
    networks:
      - app-network  

Then in views.py we add: (note the use of redis rpush)

import redis
...
redis_host=os.environ['REDIS_HOST']
redis_port = 6379
redis_password = ""
r = redis.StrictRedis(host=redis_host, port=redis_port, password=redis_password, decode_responses=True)
...
def write_post_to_redis(request):
payload = json.loads(request.body)
r.rpush("df",json.dumps(payload))

So this pushes the received payload into the redis in-memory cache. We now need to read (or pop ) it and write to the postgres database. So we need a process that wakes up every n seconds and checks. For this we need Django background_task. First, install it with:

pipenv install django-background-tasks

And add to the installed apps of the settings.py

INSTALLED_APPS = [
...
    'background_task',

Then run a migrate to add the background task tables:

python manage.py migrate

Now in views.py, add:

from background_task import background
from background_task.models import CompletedTask

And add the function to write the cached data to the postgres database, note the decorator which states it should run in the background every 5 seconds. Also note use of redis lpop.

@background(schedule=5)
def write_cached_samples():
...
payload = json.loads(r.lpop('df'))
# now do your write of payload to postgres
... and delete the completed tasks or we'll have a big db leak
CompletedTask.objects.all().delete()

In order to start the process up, add the following to the base of urls.py:

write_cached_samples(repeat=10, repeat_until=None)

Finally, because the background task needs a separate process, we duplicate the django docker container in docker-compose.yml but replace the asgi server run command with the background process run command.

django_bg:
      image: my_django
      command: >
        sh -c "python manage.py process_tasks"
      ...

In summary we add two new docker containers, one for the redis in-memory cache, and one to run the django background tasks. We use the redis lists rpush and lpop functions to create a FIFO with the API receive pushing and a background task popping.

There was a small issue where nginx was hooking up to the wrong django container, rectified by stopping and restarting the background container, some issue where docker networking routing is wrongly initialising.

Next I am replacing the Django HTTP API endpoint with a Go one to see how much of a speed up we get, as the Daphne ASGI server is hitting max CPU at only 100 requests per sec.



Source: stackoverflow