Skip to content
Advertisement

how to use word_tokenize in data frame

I have recently started using the nltk module for text analysis. I am stuck at a point. I want to use word_tokenize on a dataframe, so as to obtain all the words used in a particular row of the dataframe.

JavaScript

Basically, i want to separate all the words and find the length of each text in the dataframe.

I know word_tokenize can for it for a string, but how to apply it onto the entire dataframe?

Please help!

Thanks in advance…

Advertisement

Answer

You can use apply method of DataFrame API:

JavaScript

Output:

JavaScript

For finding the length of each text try to use apply and lambda function again:

JavaScript
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement