I’m a total noob so sorry if I’m asking something obvious. My question is twofold, or rather it’s two questions in the same topic:
- I’m studying nltk in Uni, and we’re doing chunks. In the grammar I have on my notes the following code:
grammar = r""" NP: {<DT|PP$>?<JJ>*<NN.*>+} # noun phrase PP: {<IN><NP>} # prepositional phrase VP: {<MD>?<VB.*><NP|PP>} # verb phrase CLAUSE: {<NP><VP>} # full clause """
What is the “$” symbol for in this case? I know it’s “end of the line” in regex, but what does it stand for here?
- Also, in my text book there’s a Tree that’s been printed without using the .draw() function, to this result:
Tree('S', [Tree('NP', [('the', 'DT'), ('book', 'NN')]), ('has', 'VBZ'), Tree('NP', [('many', 'JJ'), ('chapters', 'NNS')])])
How the heck one does that???
Thanks in advance to anybody who’ll have the patience to school this noob :D
Advertisement
Answer
This is the code of your example:
import nltk sentence = [('the', 'DT'), ('book', 'NN'), ('has', 'VBZ'), ('many', 'JJ'), ('chapters', 'NNS')] grammar = r""" NP: {<DT|PP$>?<JJ>*<NN.*>+} # noun phrase PP: {<IN><NP>} # prepositional phrase VP: {<MD>?<VB.*><NP|PP>} # verb phrase CLAUSE: {<NP><VP>} # full clause """ cp = nltk.RegexpParser(grammar) result = cp.parse(sentence) print(result) #output #(S(CLAUSE (NP the/DT book/NN) (VP has/VBZ (NP many/JJ chapters/NNS)))) result.draw()
The tree of:
Tree('S', [Tree('NP', [('the', 'DT'), ('book', 'NN')]), ('has', 'VBZ'), Tree('NP', [('many', 'JJ'), ('chapters', 'NNS')])])
I found this link where you can learn a lot.
The $ symbol is a special character in regular expressions, and must be backslash escaped in order to match the tag PP$.
$ Example:
Xyz$ -> Used to match the pattern xyz at the end of a string