Skip to content
Advertisement

Create dictionary of context words without stopwords

I am trying to create a dictionary of words in a text and their context. The context should be the list of words that occur within a 5 word window (two words on either side) of the term’s position in the string. Effectively, I want to ignore the stopwords in my output vectors.

My code is below. I can get the stopwords out of my dictionary’s keys but not the values.

JavaScript

the output is:

JavaScript

Advertisement

Answer

JavaScript

I would recommend to use a tuple list so in case a word occurs more than once in words the dict does not just contain the last one which overwrites previous ones. I would also put stopwords in a set, especially if its a larger list like NLTKs stopwords since that speeds up things.

I also excluded the word itself from the context but depending on how you want to use it you might want to include it.

This results in:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement