Skip to content
Advertisement

Performance tuning: string wordcount in df

I have a df with column “free text”. I wish to count how many characters and words each cell has. Currently, I do it like this:

JavaScript

Problem is, that it is pretty slow. I thought about using np.where but I wasn’t sure how. Would appreciate your help here.

Advertisement

Answer

IIUC:

you can try via str.len() and str.count():

JavaScript

Sample dataframe used:

JavaScript

OR

via numpy but you will get 0 count when there are NaN’s present:

JavaScript

output of df:

JavaScript
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement