I’m looking for a Python library for finding the longest common sub-string from a set of strings. There are two ways to solve this problem:
- using suffix trees
- using dynamic programming.
Method implemented is not important. It is important it can be used for a set of strings (not only two strings).
Advertisement
Answer
These paired functions will find the longest common string in any arbitrary array of strings:
JavaScript
x
22
22
1
def long_substr(data):
2
substr = ''
3
if len(data) > 1 and len(data[0]) > 0:
4
for i in range(len(data[0])):
5
for j in range(len(data[0])-i+1):
6
if j > len(substr) and is_substr(data[0][i:i+j], data):
7
substr = data[0][i:i+j]
8
return substr
9
10
def is_substr(find, data):
11
if len(data) < 1 and len(find) < 1:
12
return False
13
for i in range(len(data)):
14
if find not in data[i]:
15
return False
16
return True
17
18
19
print long_substr(['Oh, hello, my friend.',
20
'I prefer Jelly Belly beans.',
21
'When hell freezes over!'])
22
No doubt the algorithm could be improved and I’ve not had a lot of exposure to Python, so maybe it could be more efficient syntactically as well, but it should do the job.
EDIT: in-lined the second is_substr function as demonstrated by J.F. Sebastian. Usage remains the same. Note: no change to algorithm.
JavaScript
1
9
1
def long_substr(data):
2
substr = ''
3
if len(data) > 1 and len(data[0]) > 0:
4
for i in range(len(data[0])):
5
for j in range(len(data[0])-i+1):
6
if j > len(substr) and all(data[0][i:i+j] in x for x in data):
7
substr = data[0][i:i+j]
8
return substr
9
Hope this helps,
Jason.