I have a df with column “free text”. I wish to count how many characters and words each cell has. Currently, I do it like this:
JavaScript
x
5
1
d = {'free text': ["merry had a little lamb", "Little Jonathan found a chicken"]}
2
df = pd.DataFrame(data=d)
3
df['Chars'] = df['free text'].apply(str).apply(len)
4
df['Words'] = df['free text'].apply(lambda x: len(str(x).split()))
5
Problem is, that it is pretty slow. I thought about using np.where
but I wasn’t sure how.
Would appreciate your help here.
Advertisement
Answer
IIUC:
you can try via str.len()
and str.count()
:
JavaScript
1
3
1
df['Chars'] = df['free text'].str.len()
2
df['Words'] = df['free text'].str.count(' ')+1
3
Sample dataframe used:
JavaScript
1
3
1
d = {'free text': ["merry had a little lamb", "Little Jonathan found a chicken",np.nan]}
2
df = pd.DataFrame(data=d)
3
OR
via numpy but you will get 0 count when there are NaN’s present:
JavaScript
1
3
1
df['Chars'] =np.char.count(df['free text'].to_numpy(na_value='').astype(str),' ')
2
df['Words'] =np.char.str_len(df['free text'].to_numpy(na_value='').astype(str))
3
output of df
:
JavaScript
1
5
1
free text Chars Words
2
0 merry had a little lamb 23.0 5.0
3
1 Little Jonathan found a chicken 31.0 5.0
4
2 NaN NaN NaN
5