Skip to content
Advertisement

Tag: nltk

How do I read the following lines of code?

Apologies for the basic question as I am quite new to the topic. Can you please break the code above in the format given below: Answer I think it is better for you to look up the following subjects: List comprehensions – link zip() function – link This will give you a better understanding of what is happening and it

issue

it might be a basic question but I am stuck here not really sure what went wrong. df[‘text’] contains the text data that I want to work on and it returns [<nltk.tokenize.casual.TweetTokenizer object at 0x7f80216950a0>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f8022278670>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7fec0bbc70>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf74970>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf747c0>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf74a90>, <nltk.tokenize.casual.TweetTokenizer object at 0x7f7febf748b0>, <nltk.tokenize.casual.TweetTokenizer

How to parse guess_language to read 30000 tweets?

I am using guess_language to detect the language of the tweets for a school project. I used pandas to read the .csv file. I have around 30000 rows. However, my problem is that the guess language can only read one tweet at a time. guess_language(“Top story: ‘Massive Mental Health Crisis’ “) ‘en’ I am very new at python and been

How to strip string from punctuation except apostrophes for NLP

I am using the below “fastest” way of removing punctuation from a string: However, it removes all punctuation including apostrophes from tokens such as shouldn’t turning it into shouldnt. The problem is I am using NLTK library for stopwords and the standard stopwords don’t include such examples without apostrophes but instead have tokens that NLTK would generate if I used

Advertisement