I’m trying to identify the subject in a sentence. I tried to use some of the code here:
import spacy nlp = nlp = spacy.load("en_core_web_sm") sent = "the python can be used to find objects." #sent = "The bears in the forest, which has tall trees, are very scary" doc=nlp(sent) sentence = next(doc.sents) for word in sentence: print(word,word.dep_)
This returns the results:
I would think in this case the python would be the subject, in most cases that would be the
_dep would be
nsubj, but its
nsubjpass. So if
nsubj is not present I can check for
nsubjpass but are there any other
_dep it could be?
Is there a more robust way to determine subject?
Your sentence is a passive voice example.
nsubjpass is the subject when using passive voice
You can find the list of
dep_ by calling
for label in nlp.get_pipe("parser").labels: print(label, " -- ", spacy.explain(label))
I can see there are 2 more subject types:
csubj -- clausal subject csubjpass -- clausal subject (passive)
One possible way to determine the subject:
if "subj" in word.dep_: # continue