Skip to content
Advertisement

Sort the digits of a 1GB file containing a single number efficiently

I’m trying to print in ascending order a 1GB file containing a randomly generated big number. This is the code that I’m using to generate the random number for my test (found it here).

JavaScript

The following python code works OK and takes a bit less than 4 minutes. But I was told this can be accomplished in about 15 seconds and that I might not even need to sort the number at all for printing it in order which is kind of puzzling.

This is not a homework but a question I was asked in a job interview that i didn’t manage to solve and is killing me not knowing the solution. I was not asked to do this on any specific language but I decided to use python because I’m familiar with the language. I gave a quick test of using bash but it was already struggling my script with a 10MB file.

JavaScript

This basically chunks the file on temp files of 100MB, sort each chunk and then do a merge sort to append them. Just running a sort on the file will end up in a “MemoryError”

I also tried to just read the file once with a for and do a series of if/else to append each value to 10 different variables and then print them in order from 0 to 10 but this is very inefficient and slower than my initial method.

Clearly there needs to be a “trick” to solve this problem.

Advertisement

Answer

As everyone indicates, the expected answer is a counting sort.

It takes a little extra effort, though, to make a python-implemented counting sort beat the built-in string.sort(), which is written in C++. It’s especially important to avoid creating a new python string object for each character of data.

One solution is to use the built-in string.sort(), followed by 10 calls to string.index() to get the counts for each chunk.

I decided to use 10 calls to string.count(). Here’s the implementation:

JavaScript

Your original results:

JavaScript

My results:

JavaScript

I also noticed that your output file is a little longer than the input file, so you have a bug in there somewhere that I didn’t bother finding.

Advertisement