Skip to content
Advertisement

How to create a big file quickly with Python

I have the following code for producing a big text file:

JavaScript

But it seems to be pretty slow to even generate 5GB of this.

How can I make it better? I wish the output to be like:

JavaScript

Advertisement

Answer

Well, of course, the whole thing is I/O bound. You can’t output the file faster than the storage device can write it. Leaving that aside, there are some optimizations that could be made.

Your method of building up a long string from several shorter strings is suboptimal. You’re saying, essentially, s = s1 + s2. When you tell Python to do this, it concatenates two string objects to make a new string object. This is slow, especially when repeated.

A much better way is to collect the individual string objects in a list or other iterable, then use the join method to run them together. For example:

JavaScript

Instead of n-1 string concatenations to join n strings, this does the whole thing in one step.

There’s also a lot of repeated code that could be combined. Here’s a cleaner design, still using the loops.

JavaScript

A cleaner, briefer, more Pythonic way is to use a list comprehension:

JavaScript

Note that in both cases, I wrote the newline separately. That should be faster than concatenating it to the string, since I/O is buffered anyway. If I were joining a list of strings without separators, I’d just tack on a newline as the last string before joining.

As Daniel’s answer says, numpy is probably faster, but maybe you don’t want to get into numpy yet; it sounds like you’re kind of a beginner at this point.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement