I am trying to use Naive Bayes for spam-ham classification. I am getting a word error repeteadly on here: The error message is just this: ‘hafta’ is the first word of the pandas dataframe and the trainng dataset. I tried the solution on this issue that seemed similar to mine but it didn’t work out. I will appreciate any hint
Tag: nlp
Create list of list tuples from reading a txt file
I have a txt file that look likes And Im trying to make a tuples from this txt which ı will evalute them laterly word to features later on. I want to have a list of list look like this : All of the whitespaces indicates that the sentences over and should add to list to given index, laterly after
Count the number of times a group of words appear in a text
I have 4 lists of words that categorise something and a tokenised text by word. I would like to count the number of occurrences of the words in these lists in a certain text but as a sum of the words for each list. Therefore the results would show an occurrence of 10 animal words, 20 colour words, 6 food
Keywords extraction in Python – How to handle hyphenated compound words
I’m trying to perform keyphrase extraction with Python, using KeyBert and pke PositionRank. You can see an extract of my code below. and here the results: I would like to handle hyphenated compound words (as life-cycle in the example) are considered as a unique word, but I cannot understand how to exclude the – from the words separators list. Thank
Counting word frequency in a sentence
I have two columns – one with sentences and the other with single words. Sentence word “Such a day! It’s a beautiful day out there” “beautiful” “Such a day! It’s a beautiful day out there” “day” “I am sad by the sad weather” “weather” “I am sad by the sad weather” “sad” I want to count the frequency of the
Is there a way to find the antonym(word with the opposite meaning) of a word with python? Do you know a dataset or an nlp toolkit?
Thank you for your help! Answer NLTK is the main library for NLP and it includes many corpora. See the code here: How to generate a list of antonyms for adjectives in WordNet using Python NLTK documentation on using WordNet: https://www.nltk.org/howto/wordnet.html
Word2Vec + LSTM Good Training and Validation but Poor on Test
currently I’am training my Word2Vec + LSTM for Twitter sentiment analysis. I use the pre-trained GoogleNewsVectorNegative300 word embedding. The reason I used the pre-trained GoogleNewsVectorNegative300 because the performance much worse when I trained my own Word2Vec using own dataset. The problem is why my training process had validation acc and loss stuck at 0.88 and 0.34 respectively. Then, my confussion
Job type(Full Time , Part Time) detection with Machine learning model in Python
I have a dataset of jobs where I have columns “Title” ,”Description” , “City” etc. and “Best Jobs” column. Output of the dataset is “Best Jobs” where I have two outputs(Yes , No) Yes mean jobs are part time and No , mean job is full time. I want to train any Machine learning model. Firstly I want to train
How can I say that if I want to return an operation on a list, but it stays the same when it comes out null?
I have a list-of-list of word groups in Turkish. I want to apply stemming and I found turkishnlp package. Although it has some shortcomings, it often returns the right word. However, when I apply this to the list, I don’t want the structure of my list to change and I want the words that he doesn’t know to stay the
How to label multi-word entities?
I’m quite new to data analysis (and Python in general), and I’m currently a bit stuck in my project. For my NLP-task I need to create training data, i.e. find specific entities in sentences and label them. I have multiple csv files containing the entities I am trying to find, many of them consisting of multiple words. I have tokenized