I have the following problem as homework:
Write a O(N^2) algorithm to determine whether the string can be broken into a list of words. You can start by writing an exponential algorithm and then using dynamic programming to improve the runtime complexity.
The naive exponential algorithm which I started out with is this:
def naiveWordBreak(S): if len(S) == 0: return True else: return any([(S[:i] in wordSet) and naiveWordBreak(S[i:]) for i in range(1, len(S) + 1)])
I then adapted this into the following dynamic algorithm:
def wordBreak(S): prefixTable = [False] * (len(S) + 1) prefixTable[0] = True return _helper(S, prefixTable) def _helper(S, prefixTable): if prefixTable[len(S)]: return prefixTable[len(S)] else: for i in range(1, len(S) + 1): if S[:i] in wordSet and _helper(S[i:], prefixTable): prefixTable[i] = True return True
I am fairly confident from my proof and some testing that both algorithms are correct, however the recursive method should be exponential time while the dynamic method should be O(n^2). However, I got curious and used the timeit
library to analyze the time it takes for both algorithms to run a batch of tests and the results were surprising. The dynamic method only beats the recursive method by a fraction of a second. More confusing is that after running the same test a couple of times, the recursive method actually gave a better runtime than the dynamic method. Here is the code I’m using for testing runtime:
def testRecursive(): naiveWordBreak("alistofwords") naiveWordBreak("anotherlistofwords") naiveWordBreak("stableapathydropon") naiveWordBreak("retouchesreissueshockshopbedrollsunspotassailsinstilledintersectionpipitreappointx") naiveWordBreak("xretouchesreissueshockshopbedrollsunspotassailsinstilledintersectionpipitreappoint") naiveWordBreak("realignitingrains") def testDynamic(): wordBreak("alistofwords") wordBreak("anotherlistofwords") wordBreak("stableapathydropon") wordBreak("retouchesreissueshockshopbedrollsunspotassailsinstilledintersectionpipitreappointx") wordBreak("xretouchesreissueshockshopbedrollsunspotassailsinstilledintersectionpipitreappoint") wordBreak("realignitingrains") def main(): recTime = timeit.timeit(testRecursive, number=1) dynTime = timeit.timeit(testDynamic, number=1) print("Performance Results:n") print("Time for recursive method = {}n".format(recTime)) print("Time for dynamic method = {}n".format(dynTime)) if dynTime < recTime: print("Dynamic method produces better performance") else: print("Recursive method produces better performance")
The way I see it, there are only a few explanations for why the runtimes are inconsistent/not what I expected:
- There is something wrong with my dynamic algorithm (or my analysis of it)
- There is something wrong with my recursive algorithm
- My test cases are insufficient
timeit
isn’t actually an appropriate library for what I’m trying to do
Does anyone have any insights or explanations?
Advertisement
Answer
The naive recursive approach is only slow when there are many, many, ways to break up the same string into words. If there is only one way, then it will be linear.
Assuming that can
, not
and cannot
are all words in your list, try a string like "cannot" * n
. By the time you get to n=40
, you should see the win pretty clearly.