How to determine the optimal amount of buffer size with asyncio/aiohttp

Question

How do we decide the optimal parameter for read() when working with asyncio in python? 12 bytes? 100 bytes? Answer How do we decide the optimal parameter for read() when working with asyncio in python? 12 bytes? 100 bytes? You can safely choose a much larger number than that. If the number is too small (e.g. …

Accepted Answer

How do we decide the optimal parameter for read() when working with asyncio in python? 12 bytes? 100 bytes?

async with self._session.get(url, headers=headers) as response:
    chunk_size = 12
    result = ''

    while True:
       chunk = await response.content.read(chunk_size)
          if not chunk:
              break
          elif isinstance(chunk, (bytes, bytearray)):
              data = chunk.decode('utf8')
               result += data

Answer

How do we decide the optimal parameter for read() when working with asyncio in python? 12 bytes? 100 bytes?

You can safely choose a much larger number than that. If the number is too small (e.g. just 1), your loop will consist of many calls to StreamReader.read, each of which does carry a fixed overhead – it has to check whether there’s something in the buffer, and either return a portion of that and update the remaining buffer, or wait for something new to arrive. On the other hand, if the requested size is excessively large, it might in theory require unnecessarily large allocations. But as StreamReader.read is allowed to return less data than specified, it never returns a chunk larger than the internal buffer (64 KiB by default), so that’s a non-issue.

In summary: any number above 1024 or so will do fine because it will be large enough to avoid an unnecessary number of function calls. Requesting more than 65536 is in most cases the same as requesting 65536. I tend to request 1024 bytes when I don’t care about absolute best performance (smaller chunks are easier on the eyes when debugging), and a larger value like 16384 when I do. The numbers don’t have to be powers of 2, btw, it’s just convention from the more low-level languages.

When dealing specifically with aiohttp streams, you can call readany, a method that just returns whatever data is available, and if nothing is available, waits for some data to arrive and returns that. That is probably the best option if you’re dealing with aiohttp streams because it just gives you the data from the internal buffer without having to wonder about its size.

Advertisement

Answer