Skip to content
Advertisement

Tag: nlp

KeyError on a certain word

I am trying to use Naive Bayes for spam-ham classification. I am getting a word error repeteadly on here: The error message is just this: ‘hafta’ is the first word of the pandas dataframe and the trainng dataset. I tried the solution on this issue that seemed similar to mine but it didn’t work out. I will appreciate any hint

Create list of list tuples from reading a txt file

I have a txt file that look likes And Im trying to make a tuples from this txt which ı will evalute them laterly word to features later on. I want to have a list of list look like this : All of the whitespaces indicates that the sentences over and should add to list to given index, laterly after

Keywords extraction in Python – How to handle hyphenated compound words

I’m trying to perform keyphrase extraction with Python, using KeyBert and pke PositionRank. You can see an extract of my code below. and here the results: I would like to handle hyphenated compound words (as life-cycle in the example) are considered as a unique word, but I cannot understand how to exclude the – from the words separators list. Thank

Counting word frequency in a sentence

I have two columns – one with sentences and the other with single words. Sentence word “Such a day! It’s a beautiful day out there” “beautiful” “Such a day! It’s a beautiful day out there” “day” “I am sad by the sad weather” “weather” “I am sad by the sad weather” “sad” I want to count the frequency of the

Word2Vec + LSTM Good Training and Validation but Poor on Test

currently I’am training my Word2Vec + LSTM for Twitter sentiment analysis. I use the pre-trained GoogleNewsVectorNegative300 word embedding. The reason I used the pre-trained GoogleNewsVectorNegative300 because the performance much worse when I trained my own Word2Vec using own dataset. The problem is why my training process had validation acc and loss stuck at 0.88 and 0.34 respectively. Then, my confussion

How to label multi-word entities?

I’m quite new to data analysis (and Python in general), and I’m currently a bit stuck in my project. For my NLP-task I need to create training data, i.e. find specific entities in sentences and label them. I have multiple csv files containing the entities I am trying to find, many of them consisting of multiple words. I have tokenized

Advertisement