Skip to content

Tag: nlp

What does this “.children” attribute do?

I’m trying to understand a Key-Bigram extractor’s working and I cannot understand what does the following block of code do. Here is the source code. Everything else is workin fine and I understood well, however I can not understand what child for child in possible_words.children does. Answer token.children uses the dependency parse to get all tokens that directly depend on

Gensim Word2Vec exhausting iterable

I’m getting the following prompt when calling model.train() from gensim word2vec The only solutions I found on my search for an answer point to the itarable vs iterator difference, and at this point, I tried everything I could to solve this on my own, currently, my code looks like this: The corpus variable is a list containing sentences, and each

How to resolve TypeError: cannot use a string pattern on a bytes-like object – word_tokenize, Counter and spacy

My dataset is a sales transactions history of an online store. I need to create a category based on the texts in the Description column. I have done some text pre-processing and clustering. This is how the dataframe cat_df head looks like: Description Text Cluster9 0 WHITE HANGING HEART T-LIGHT HOLDER white hanging heart t-light holder 1 1 WHITE METAL

Regex: searching for words that starts with @ or @

I want to create a regex in python that find words that start with @ or @. I have created the following regex, but the output contains one extra space in each string as you can see However, the output that I want to have is the following I would be grateful if you could help me! Edit: @The fourth

How to handle numbers embedded in text during NLP pre-processing?

I am trying to run the LDA algorithm on a data set of news articles. I understand that numbers must be removed during the pre-processing step, and I have written a simple regex code to replace numbers with blanks. However, I would like to retain some numbers since removing them can potentially change the context/topic. For example, [Desired] ‘The fourth

KeyError on a certain word

I am trying to use Naive Bayes for spam-ham classification. I am getting a word error repeteadly on here: The error message is just this: ‘hafta’ is the first word of the pandas dataframe and the trainng dataset. I tried the solution on this issue that seemed similar to mine but it didn’t work out. I will appreciate any hint