Skip to content
Advertisement

How to resolve TypeError: cannot use a string pattern on a bytes-like object – word_tokenize, Counter and spacy

My dataset is a sales transactions history of an online store. I need to create a category based on the texts in the Description column. I have done some text pre-processing and clustering. This is how the dataframe cat_df head looks like:

Description Text Cluster9
0 WHITE HANGING HEART T-LIGHT HOLDER white hanging heart t-light holder 1
1 WHITE METAL LANTERN white metal lantern 4
2 CREAM CUPID HEARTS COAT HANGER cream cupid hearts coat hanger 0
3 KNITTED UNION FLAG HOT WATER BOTTLE knitted union flag hot water bottle 8
4 RED WOOLLY HOTTIE WHITE HEART red woolly hottie white heart 1

I created a groupby for each cluster:

JavaScript

Now I want to tokenize and count the words per cluster index.

JavaScript

But I got an error:

JavaScript

How do I convert cluster9[0] into just one long string so I can pass it to word_tokenize and Counter?

I also tried spacy.

JavaScript

Any help and suggestions will be appreciated. Thank you.

Advertisement

Answer

Taken your data and created dummy dataframe for the same .

JavaScript

you will get the desired ouptut

enter image description here

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement