Skip to content
Advertisement

Python comparing two lists and filtering items

I would like to do some word filtering (extracting only items in ‘keyword’ list that exist in ‘whitelist’).

Here is my code so far:

whitelist = ['Cat', 'Dog', 'Cow']
keyword = ['Cat, Cow, Horse', 'Bird, Whale, Dog', 'Pig, Chicken', 'Tiger, Cat']
keyword_filter = []
 
for word in whitelist:
    for i in range(len(keyword)):
        if word in keyword[i]:
            keyword_filter.append(word)
        else: pass

I want to remove every word except for ‘Cat’, ‘Dog’, and ‘Cow’ (which are in the ‘whitelist’) so that the result (‘keyword_filter’ list) will look like this:

['Cat, Cow', 'Dog', '', 'Cat']

However, I got the result something like this:

['Cat', 'Cat', 'Dog', 'Cow']

I would sincerely appreciate if you can give some advice.

Advertisement

Answer

You need to split the strings in the list and check if word in the split is contained in the whitelist. Then rejoin all words in the whitelist after filtering:

whitelist = {'Cat', 'Dog', 'Cow'}
filtered = []
for words in keyword:
    filtered.append(', '.join(w for w in words.split(', ') if w in whitelist))

print(filtered)
# ['Cat, Cow', 'Dog', '', 'Cat']

Better to make whitelist a set to improve the performance for lookup of each word.

You could also use re.findall to find all parts of each word matching strings contained in the whitelist, and then rejoin after finding the matches:

import re

pattern = re.compile(',?s?Cat|,?s?Dog|,?s?Cow')
filtered = [''.join(pattern.findall(words))) for words in keyword]
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement