I have a Django REST API endpoint. It receives a JSON payload eg.
{ "data" : [0,1,2,3] }
This is decoded in a views.py function and generates a new database object like so (pseudo code):
newobj = MyObj.(col0 = 0, col1= 1, col2 = 2, col3 = 3) newobj.save()
In tests, it is 20x faster to create a list of x1000 newobjs, then do a bulk create:
Myobj.objects.bulk_create(newobjs, 1000)
So, the question is how to save individual POSTs somewhere in Django ready for batch writes when we have 1000 of them ?
Advertisement
Answer
Thanks for the above responses, the answers included some of what was suggested, but is a superset, so here’s a summary.
This is really about creating a FIFO. memcached turns out to be unsuitable (after trying) because only redis has a list function that enables this, explained nicely here.
Also note that the Django built in cache does not support the redis list api calls.
So we need a new docker-compose.yml entry to add redis:
  redis:
    image: redis
    ports:
      - 6379:6379/tcp
    networks:
      - app-network  
Then in views.py we add: (note the use of redis rpush)
import redis
...
redis_host=os.environ['REDIS_HOST']
redis_port = 6379
redis_password = ""
r = redis.StrictRedis(host=redis_host, port=redis_port, password=redis_password, decode_responses=True)
...
def write_post_to_redis(request):
payload = json.loads(request.body)
r.rpush("df",json.dumps(payload))
So this pushes the received payload into the redis in-memory cache. We now need to read (or pop ) it and write to the postgres database. So we need a process that wakes up every n seconds and checks. For this we need Django background_task. First, install it with:
pipenv install django-background-tasks
And add to the installed apps of the settings.py
INSTALLED_APPS = [
...
    'background_task',
Then run a migrate to add the background task tables:
python manage.py migrate
Now in views.py, add:
from background_task import background from background_task.models import CompletedTask
And add the function to write the cached data to the postgres database, note the decorator which states it should run in the background every 5 seconds. Also note use of redis lpop.
@background(schedule=5)
def write_cached_samples():
...
payload = json.loads(r.lpop('df'))
# now do your write of payload to postgres
... and delete the completed tasks or we'll have a big db leak
CompletedTask.objects.all().delete()
In order to start the process up, add the following to the base of urls.py:
write_cached_samples(repeat=10, repeat_until=None)
Finally, because the background task needs a separate process, we duplicate the django docker container in docker-compose.yml but replace the asgi server run command with the background process run command.
django_bg:
      image: my_django
      command: >
        sh -c "python manage.py process_tasks"
      ...
In summary we add two new docker containers, one for the redis in-memory cache, and one to run the django background tasks. We use the redis lists rpush and lpop functions to create a FIFO with the API receive pushing and a background task popping.
There was a small issue where nginx was hooking up to the wrong django container, rectified by stopping and restarting the background container, some issue where docker networking routing is wrongly initialising.
Next I am replacing the Django HTTP API endpoint with a Go one to see how much of a speed up we get, as the Daphne ASGI server is hitting max CPU at only 100 requests per sec.