I am working on an implementation of recursive quicksort in python. I am working with very large data sets (10,000 – 1,000,000 elements). When feeding it ordered data (i.e. changing an array sorted from largest -> smallest to smallest -> largest) it works fine. But when giving it unordered data, it doesn’t seem to work at all.
I’m using a 3-way pivot that counts elements that are equal to the pivot (dutch national flag problem) and, if the data is larger than 101 (remember it’s recursive), it uses a median of first, middle, and last elements in the array partition when selecting the pivot.
Slice of output from a random set of 10,000 elements:
129 1 16 7 21 4 16 19 4 15 22 23 1 20 25 27 1 4 15 31 31 37 41 32 42 46 46 44
As you can see, it seems to have a certain pattern to it. I’ve increased the recursion limit so I know that’s not the problem, but it almost seems like it’s only partially sorting the input. Like I mentioned, it works fine for an array that’s already sorted in reverse order, which is what mystifies me.
My code:
def array_swap(A, i, j): temp = A[i] A[i] = A[j] A[j] = temp # take median of three elements def median_three(a, b, c): return sorted([a, b, c])[1] def qs_part(A, lo, hi): mid = lo p = median_three(A[lo], A[(lo + hi) // 2], A[hi]) # pivot set to median of 3, as described in handout # ('//' must be used for floor division in Python 3 -- it took me longer than I'm willing to admit to figure that out.) while mid <= lo: if A[mid] < p: array_swap(A, mid, lo) mid += 1 lo += 1 elif A[mid] > p: array_swap(A, mid, hi) hi -= 1 else: mid += 1 return (lo - 1), mid # Quicksort implementation as described in handout. # A: Array to sort # write: name of file to write sorted results to # out: name of file to write runtime to # lo: start index (1 after pivot in first case) # hi: end index def quick_sort(A,lo, hi): if lo >= hi: return # if there are 0 or 1 elements in A if lo - hi == 1: if A[lo] < A[hi]: # if 2 elements, sort them array_swap(A, lo, hi) return i, j = qs_part(A, lo, hi) quick_sort(A, lo, i) # recursively sort partition 1 (elements < pivot) quick_sort(A, j, hi) # recursively sort partition 2 (elements > pivot) # no need to sort between 'i' and 'j' because those items == pivot
Advertisement
Answer
I figured it out. It was just a small typo in qs_part(). while mid <= lo
should have been while mid <= hi