Input file is:
JavaScript
x
26
26
1
l1 = ['Passing much less urine', 'Bleeding from any body part', 'Feeling extremely lethargic/weak', 'Excessive sleepiness/restlessness', 'Altered mental status', 'Seizure/fits', 'Breathlessness', 'Blood in sputum', 'Chest pain', 'Sound/noise in breathing', 'Drooling of saliva', 'Difficulty in opening mouth']
2
3
4
k=[]
5
for n in range(0,len(l1)):
6
e = l1[n]
7
doc =nlp(e)
8
for token in doc:
9
if token.lemma_ != "-PRON-":
10
temp = token.lemma_.lower().strip()
11
else:
12
temp = token.lower_
13
k.append(temp)
14
cleaned_tokens = []
15
t = []
16
d = []
17
18
for token in k:
19
li = []
20
if token not in stopwords and token not in punct:
21
cleaned_tokens.append(token)
22
23
li= " ".join(cleaned_tokens)
24
t.append(li)
25
print(t)
26
This code gives output:
JavaScript
1
4
1
['pass urine']
2
['pass urine bleed body']
3
['pass urine bleed body feel extremely lethargic weak']
4
But I need output should be:
JavaScript
1
2
1
["pass urine", "bleed body", "feel extremely lethargic weak"]
2
Suggest me how can I get this result.
Advertisement
Answer
This produces the results you want:
JavaScript
1
15
15
1
import spacy
2
nlp = spacy.load("en_core_web_md")
3
4
l1 = ['Passing much less urine', 'Bleeding from any body part', 'Feeling extremely lethargic/weak', 'Excessive sleepiness/restlessness', 'Altered mental status', 'Seizure/fits', 'Breathlessness', 'Blood in sputum', 'Chest pain', 'Sound/noise in breathing', 'Drooling of saliva', 'Difficulty in opening mouth']
5
docs = nlp.pipe(l1)
6
7
t= []
8
for doc in docs:
9
clean_doc = " ".join([tok.text.lower() for tok in doc if not tok.is_stop and not tok.is_punct])
10
t.append(clean_doc)
11
12
print(t)
13
14
['passing urine', 'bleeding body', 'feeling extremely lethargic weak', 'excessive sleepiness restlessness', 'altered mental status', 'seizure fits', 'breathlessness', 'blood sputum', 'chest pain', 'sound noise breathing', 'drooling saliva', 'difficulty opening mouth']
15
In case you need lemma:
JavaScript
1
8
1
t= []
2
for doc in docs:
3
clean_doc = " ".join([tok.lemma_.lower() for tok in doc if not tok.is_stop and not tok.is_punct])
4
t.append(clean_doc)
5
6
print(t)
7
['pass urine', 'bleed body', 'feel extremely lethargic weak', 'excessive sleepiness restlessness', 'alter mental status', 'seizure fit', 'breathlessness', 'blood sputum', 'chest pain', 'sound noise breathing', 'drool saliva', 'difficulty open mouth']
8