Skip to content
Advertisement

Check substring with 1 character difference in python

I have list of thousand of strings sub_strings, and list of millions of strings strings

I want to check if strings[i] have substring from sub_strings or substring that’s different in 1 character

sub_strings = ['hello']
strings =     ['hell dude whats good',
               'hllo',
               'hallo',
               'hello',
               'dude whats good']

is_substring_no_more_then_1_differnce(strings , sub_strings)

expected

[True, True, True, True, False]

Advertisement

Answer

Here’s a route for solving:

from Levenshtein import distance as lev

sub_strings = ['hello', 'bob']
strings =     ['hell dude whats good',
               'hllo',
               'hallo',
               'hello',
               'dude whats good',
               'bobo wants some food']

distance_list = []
for sentence in strings:
    distance_list.append(min([lev(word, substring) for word in sentence.split() for substring in sub_strings]))

print([x <= 1 for x in distance_list])

That will spit out [True, True, True, True, False, True]

But this is going to get very slow as you add elements to either list. Every individual word inside a string inside strings has to be checked against every word in substrings. That’s a lot of checking when you have millions of words in each string in strings and thousands of subtrings.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement