Tag: nltk

Resource reuters not found

I’m using windows system, python 3.7 when I install: it has no problem to import, and I also already install nltk in my cmd but when I conduct the code: it has some Error, and I don’t know how to fix it… However, the code works well on my MacBook I’m wondering what’s going on with the windows system p.s

python nltk — stemming list of sentences/phrases

nltk porter-stemmer python stem

I have bunch of sentences in a list and I wanted to use nltk library to stem it. I am able to stem one sentence at a time, however I am having issues stemming sentences from a list and joining them back together. Is there a step I am missing? Quite new to nltk library. Thanks! Answer You’re passing a

Counting specific words in a sentence

nltk python vector

I am currently trying to solve this homework question. My task is to implement a function that returns a vector of word counts in a given text. I am required to split the text into words then use NLTK’s tokeniser to tokenise each sentence. This is the code I have so far: There are two doctests that should give the

How to train Naive Bayes Classifier for n-gram (movie_reviews)

classification nlp nltk python

Below is the code of training Naive Bayes Classifier on movie_reviews dataset for unigram model. I want to train and analyze its performance by considering bigram, trigram model. How can we do it. Answer Simply change your featurizer BTW, your code will be a lot faster if you change your featurizer to do use a set for your stopword list

TypeError: expected string or bytes-like object – with Python/NLTK word_tokenize

dataframe nltk pandas python python-3.x

I have a dataset with ~40 columns, and am using .apply(word_tokenize) on 5 of them like so: df[‘token_column’] = df.column.apply(word_tokenize). I’m getting a TypeError for only one of the columns, we’ll call this problem_column Here’s the full error (stripped df and column names, and pii), I’m new to Python and am still trying to figure out which parts of the

re.sub erroring with “Expected string or bytes-like object”

nltk pandas python regex

I have read multiple posts regarding this error, but I still can’t figure it out. When I try to loop through my function: Here is the error: Answer As you stated in the comments, some of the values appeared to be floats, not strings. You will need to change it to strings before passing it to re.sub. The simplest way

Python regex to remove punctuation except from URLs and decimal numbers

nltk python regex

People, I need a regex to remove punctuation from a string, but keep the accents and URLs. I also have to keep the mentions and hashtags from that string. I tried with the code below but unfortunately, it replaces the characters with accents but I want to keep the accents. The output for the following text “Apenas um teste com

ValueError: Could not find a default download directory of nltk

nltk python

I have problem on import nltk. I configured apache and run some sample python code, it worked well on the browser. The URL is : /localhost/cgi-bin/test.py. When I import the nltk in test.py its not running. The execution not continue after the “import nltk” line.And it gives me that error ValueError: Could not find a default download directory But when

ntlk: how to get inflections of words

lemmatization nltk python

I have a list of words, nearly 5000 English words, and for each word I need these inflectional forms: noun: singular and plural verb: infinitive, present simple, present simple 3rd person, past simple, present participle (ing form), past participle adjective: comparative and superlative adverb How can I extract these information from a given word (e.g. help) in ntlk via python?

how to use word_tokenize in data frame

nltk pandas python

I have recently started using the nltk module for text analysis. I am stuck at a point. I want to use word_tokenize on a dataframe, so as to obtain all the words used in a particular row of the dataframe. Basically, i want to separate all the words and find the length of each text in the dataframe. I know