Create dictionary of context words without stopwords

Question

I am trying to create a dictionary of words in a text and their context. The context should be the list of words that occur within a 5 word window (two words on either side) of the term's position in the string. Effectively, I want to ignore the stopwords in my output vectors. My code is below. I can get

Accepted Answer

words = ["This", "is", "a", "longer", "example", "sentence"]stopwords = set(["it", "the", "was", "of", "is", "a"])context_size = 2stripes = []for index, word in enumerate(words):    if word.lower() in stopwords:        continue    i = max(index - context_size, 0)    j = min(index + context_size, len(words) - 1) + 1    context = words[i:index] + words[index + 1:j]    stripes.append((word, context))print(stripes)I would recommend to use a tuple list so in case a word occurs more than once in words the dict does not just contain the last one which overwrites previous ones. I would also put stopwords in a set, especially if its a larger list like NLTKs stopwords since that speeds up things.I also excluded the word itself from the context but depending on how you want to use it you might want to include it.This results in:[('This', ['is', 'a']), ('longer', ['is', 'a', 'example', 'sentence']), ('example', ['a', 'longer', 'sentence']), ('sentence', ['longer', 'example'])]

Advertisement

Answer