Skip to content
Advertisement

Classifiy dataframe row according to string occurence from a list

With the following dataframe:

                         Sentence
0  This is an example of sentence
1         This is another example
2     This is an dfferent example
3    A sentence is a bag of words
4                    Random words

And the following list:

['sentence', 'another', 'words']

What is the most efficient way to summarize the occurrence of each word from the list in each row of the column ‘Sentence’? I’m looking for the following result:

                         Sentence     word_occurence
0  This is an example of sentence           sentence
1         This is another example            another
2     This is an dfferent example                   
3    A sentence is a bag of words  [sentence, words]
4                    Random words              words

Thanks in advance!

Advertisement

Answer

You can do it using apply function as well:

df.assign(word_occurence = lambda x: x.sentence.apply(lambda s: np.array([witem for witem in w if witem in s])))
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement