Identify subject in sentences using spacy in advanced cases

Tags: , ,

I’m trying to identify the subject in a sentence. I tried to use some of the code here:

import spacy
nlp = nlp = spacy.load("en_core_web_sm")
sent = "the python can be used to find objects."
#sent = "The bears in the forest, which has tall trees, are very scary"

sentence = next(doc.sents) 

for word in sentence:

This returns the results:

  • the det
  • python nsubjpass
  • can aux
  • be auxpass
  • used ROOT
  • to aux
  • find xcomp
  • objects dobj

I would think in this case the python would be the subject, in most cases that would be the _dep would be nsubj, but its nsubjpass. So if nsubj is not present I can check for nsubjpass but are there any other _dep it could be?

Is there a more robust way to determine subject?


Your sentence is a passive voice example. nsubjpass is the subject when using passive voice

You can find the list of dep_ by calling

for label in nlp.get_pipe("parser").labels:
    print(label, " -- ", spacy.explain(label))

I can see there are 2 more subject types:

csubj  --  clausal subject
csubjpass  --  clausal subject (passive)

One possible way to determine the subject:

if "subj" in word.dep_:
    # continue

Source: stackoverflow