I’m trying to identify the subject in a sentence. I tried to use some of the code here:
JavaScript
x
11
11
1
import spacy
2
nlp = nlp = spacy.load("en_core_web_sm")
3
sent = "the python can be used to find objects."
4
#sent = "The bears in the forest, which has tall trees, are very scary"
5
doc=nlp(sent)
6
7
sentence = next(doc.sents)
8
9
for word in sentence:
10
print(word,word.dep_)
11
This returns the results:
- the det
- python nsubjpass
- can aux
- be auxpass
- used ROOT
- to aux
- find xcomp
- objects dobj
I would think in this case the python would be the subject, in most cases that would be the _dep
would be nsubj
, but its nsubjpass
. So if nsubj
is not present I can check for nsubjpass
but are there any other _dep
it could be?
Is there a more robust way to determine subject?
Advertisement
Answer
Your sentence is a passive voice example. nsubjpass
is the subject when using passive voice
You can find the list of dep_
by calling
JavaScript
1
3
1
for label in nlp.get_pipe("parser").labels:
2
print(label, " -- ", spacy.explain(label))
3
I can see there are 2 more subject types:
JavaScript
1
3
1
csubj -- clausal subject
2
csubjpass -- clausal subject (passive)
3
One possible way to determine the subject:
JavaScript
1
3
1
if "subj" in word.dep_:
2
# continue
3