I have bunch of sentences in a list and I wanted to use nltk library to stem it. I am able to stem one sentence at a time, however I am having issues stemming sentences from a list and joining them back together. Is there a step I am missing? Quite new to nltk library. Thanks!
JavaScript
x
26
26
1
import nltk
2
from nltk.stem import PorterStemmer
3
ps = PorterStemmer()
4
5
# Success: one sentences at a time
6
data = 'the gamers playing games'
7
words = word_tokenize(data)
8
for w in words:
9
print(ps.stem(w))
10
11
12
# Fails:
13
14
data_list = ['the gamers playing games',
15
'higher scores',
16
'sports']
17
words = word_tokenize(data_list)
18
for w in words:
19
print(ps.stem(w))
20
21
# Error: TypeError: expected string or bytes-like object
22
# result should be:
23
['the gamer play game',
24
'higher score',
25
'sport']
26
Advertisement
Answer
You’re passing a list to word_tokenize
which you can’t.
The solution is to wrap your logic in another for-loop
,
JavaScript
1
14
14
1
data_list = ['the gamers playing games','higher scores','sports']
2
for words in data_list:
3
words = tokenize.word_tokenize(words)
4
for w in words:
5
print(ps.stem(w))
6
7
>>>>the
8
gamer
9
play
10
game
11
higher
12
score
13
sport
14