Skip to content
Advertisement

How to train Naive Bayes Classifier for n-gram (movie_reviews)

Below is the code of training Naive Bayes Classifier on movie_reviews dataset for unigram model. I want to train and analyze its performance by considering bigram, trigram model. How can we do it.

JavaScript

Advertisement

Answer

Simply change your featurizer

JavaScript

BTW, your code will be a lot faster if you change your featurizer to do use a set for your stopword list and initialize it only once.

JavaScript

Someone should really tell the NLTK people to convert the stopwords list into a set type since it’s “technically” a unique list (i.e. a set).

JavaScript

For the fun of benchmarking

JavaScript

[out]:

JavaScript

Your original code returns an accuracy of 0.725.

Use more orders of ngrams

JavaScript

[out]:

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement