Skip to content
Advertisement

Optimal way of storing stream content on disk using Python

I’d like to stream data directly to disk.

One way of doing that is simply to read data and write to file, but I also want to minimalize RAM usage.

with open("dummy.source", "br") as out, open("dummy.copy", "bw") as in_:
    in_.write(out.read())  # this causes reading the whole stream into memory

I’ve figured out some manual way of doing that:

with open("dummy.source", "br") as out, open("dummy.copy", "bw") as in_:
    while b := out.read(BUFFER_SIZE):
        in_.write(b)

Do I really have to manually load stream part by part? If so, how can I determine optimal value of BUFFER_SIZE?

Advertisement

Answer

the optimal value of buffer size is most likely the size of the buffer already reserved by python which is 8192 bytes on most systems, but any value below that is fine as the IO will be buffered by python anyway.

you can change that using the buffering argument of open but 8192 is the optimal size on a lot of systems.

you can actually grab it from the current python interpreter by using

from io import DEFAULT_BUFFER_SIZE

this is in case it is changed in the future or for a given python interpreter.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement