I’m looking for a Python library for finding the longest common sub-string from a set of strings. There are two ways to solve this problem:
- using suffix trees
- using dynamic programming.
Method implemented is not important. It is important it can be used for a set of strings (not only two strings).
Advertisement
Answer
These paired functions will find the longest common string in any arbitrary array of strings:
def long_substr(data): substr = '' if len(data) > 1 and len(data[0]) > 0: for i in range(len(data[0])): for j in range(len(data[0])-i+1): if j > len(substr) and is_substr(data[0][i:i+j], data): substr = data[0][i:i+j] return substr def is_substr(find, data): if len(data) < 1 and len(find) < 1: return False for i in range(len(data)): if find not in data[i]: return False return True print long_substr(['Oh, hello, my friend.', 'I prefer Jelly Belly beans.', 'When hell freezes over!'])
No doubt the algorithm could be improved and I’ve not had a lot of exposure to Python, so maybe it could be more efficient syntactically as well, but it should do the job.
EDIT: in-lined the second is_substr function as demonstrated by J.F. Sebastian. Usage remains the same. Note: no change to algorithm.
def long_substr(data): substr = '' if len(data) > 1 and len(data[0]) > 0: for i in range(len(data[0])): for j in range(len(data[0])-i+1): if j > len(substr) and all(data[0][i:i+j] in x for x in data): substr = data[0][i:i+j] return substr
Hope this helps,
Jason.