Tag: nltk

How do I read the following lines of code?

Apologies for the basic question as I am quite new to the topic. Can you please break the code above in the format given below: Answer I think it is better for you to look up the following subjects: List comprehensions – link zip() function – link This will give you a better understanding of what is happening and it

issue

nltk python tokenize

it might be a basic question but I am stuck here not really sure what went wrong. df[‘text’] contains the text data that I want to work on and it returns [<nltk.tokenize.casual.TweetTokenizer object at 0x7f80216950a0>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f8022278670>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7fec0bbc70>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf74970>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf747c0>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf74a90>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf748b0>, <nltk.tokenize.casual.TweetTokenizer

UnboundLocalError: local variable referenced before assignment doesn’t work in command line call

command-line global nltk python

I am aware that there are many solutions to this kind of question. However, none of them seems to have helped with my case. This is the code I’m referring to: When I try this code in PyCharm, it works without problems. However, if I use it via command line call it gives this error: Of course, for the command

Architecture Not Supported Error when installing nltk with pip on Mac

macos-catalina nltk pip python

New MacBookPro running Catalina. I have a virtualenv with no additional libraries installed yet. When I try to install nltk with pip3 install nltk, I get the following long error. The gist of it being “Architecture Not Supported”. I tried installing with pip3 install -U but got a similar failure. Below is the all of the terminal text beginning with

Searching over a list of individual sentences by a specific term in Python

nlp nltk python string

I have a list of terms in Python that look like this. As well as a list of individual sentences that may contain the name of that fruit in a data frame. Something similar to this: And I want to take the sentences in the review column, match them with the fruit mentioned in the text and print out a

How to parse guess_language to read 30000 tweets?

nltk pandas python

I am using guess_language to detect the language of the tweets for a school project. I used pandas to read the .csv file. I have around 30000 rows. However, my problem is that the guess language can only read one tweet at a time. guess_language(“Top story: â€˜Massive Mental Health Crisisâ€™ “) ‘en’ I am very new at python and been

How to use spacy to do Name Entity recognition on CSV file

csv named-entity-recognition nltk pandas python

I have tried so many things to do name entity recognition on a column in my csv file, i tried ne_chunk but i am unable to get the result of my ne_chunk in columns like so Instead after using this code, i got this error So, i am wondering if i could do this using spaCy which is another thing

How to strip string from punctuation except apostrophes for NLP

nlp nltk python

I am using the below “fastest” way of removing punctuation from a string: However, it removes all punctuation including apostrophes from tokens such as shouldn’t turning it into shouldnt. The problem is I am using NLTK library for stopwords and the standard stopwords don’t include such examples without apostrophes but instead have tokens that NLTK would generate if I used

Using Pyinstaller with NLTK results in error: can’t find nltk_data

exe nltk pyinstaller python

I am attempting to export a simple GUI that used NLTK as an exe with Python 3.6 and Windows 10. When I run PyInstaller to freeze my simple program as an exe I get the error: Unable to find “c:usersusrnltk_data” when adding binary and data files. When I even copied the nltk_data folder here and I get an error in

how to compare two text document with tfidf vectorizer?

I have two different text which I want to compare using tfidf vectorization. What I am doing is: tokenizing each document vectorizing using TFIDFVectorizer.fit_transform(tokens_list) Now the vectors that I get after step 2 are of different shape. But as per the concept, we should have the same shape for both the vectors. Only then the vectors can be compared. What