Identifying common elements in a list of words

Question

I have list of words in a column where I need to find common elements. For example, list contains words such as, sinazz31 sinazz12 45sinazz sinazz_84 As you can see, the common element is “sinazz”. Is there a way to develop an algorithm in Python to identify such common elements? If the length of the words are less than 4,

Accepted Answer

You could search for substrings contained in all of the source strings. Starting with the length of the shortest string and going down from there:string = 'sinazz31 sinazz12 45sinazz sinazz_84'min_substring_length = 3words = string.split()longest_word = max(filter(None, words), key=len)matches = {}for sub_length in range(len(longest_word), min_substring_length - 1, -1):    for x in range(len(longest_word) - sub_length):            substring = longest_word[(0 + x):(sub_length + x)] # create substring to check            check = len([1 for word in words if (substring in word)]) # number of words containing substring            if check > 1:                matches[substring] = check # number of words containing substring# resultsif matches:    match_list = list(sorted(matches,key=matches.get,reverse=True)) # list of matches by frequency    if matches[match_list[0]] == len(words): # prints substring if matches all words        print('best match for all words:',match_list[0])    print('best to worst:',match_list)

Advertisement

Answer