I’d like to stream data directly to disk.
One way of doing that is simply to read data and write to file, but I also want to minimalize RAM usage.
with open("dummy.source", "br") as out, open("dummy.copy", "bw") as in_:
in_.write(out.read()) # this causes reading the whole stream into memory
I’ve figured out some manual way of doing that:
with open("dummy.source", "br") as out, open("dummy.copy", "bw") as in_:
while b := out.read(BUFFER_SIZE):
in_.write(b)
Do I really have to manually load stream part by part?
If so, how can I determine optimal value of BUFFER_SIZE
?
Advertisement
Answer
the optimal value of buffer size is most likely the size of the buffer already reserved by python which is 8192
bytes on most systems, but any value below that is fine as the IO will be buffered by python anyway.
you can change that using the buffering
argument of open
but 8192
is the optimal size on a lot of systems.
you can actually grab it from the current python interpreter by using
from io import DEFAULT_BUFFER_SIZE
this is in case it is changed in the future or for a given python interpreter.