I think the question is self-explanatory but here goes the detailed meaning of the question.
I want to extract all trigrams from text files using the nltk
library having adjectives as the middle term.
Example Text – A red ball was with the good boy.
Example of output –
('A','red','ball'), ('the','good','boy')
and so on
Advertisement
Answer
This code should do it:
import nltk from nltk.tokenize import word_tokenize nltk.download('punkt') nltk.download('averaged_perceptron_tagger') text = word_tokenize("He is a very handsome man. Her childern are funny. She has a lovely voice") text_tags = nltk.pos_tag(text) results = list() for i, (txt, tag) in enumerate(text_tags): if tag in ["JJ", "JJR", "JJS"]: if (i > 0) and (i < len(text_tags)-1): results.append((text_tags[i-1][0], txt, text_tags[i+1][0])) # output: [('very', 'handsome', 'man'), ('are', 'funny', '.'), ('a', 'lovely', 'voice')]