I have a df with column “free text”. I wish to count how many characters and words each cell has. Currently, I do it like this:
d = {'free text': ["merry had a little lamb", "Little Jonathan found a chicken"]} df = pd.DataFrame(data=d) df['Chars'] = df['free text'].apply(str).apply(len) df['Words'] = df['free text'].apply(lambda x: len(str(x).split()))
Problem is, that it is pretty slow. I thought about using np.where
but I wasn’t sure how.
Would appreciate your help here.
Advertisement
Answer
IIUC:
you can try via str.len()
and str.count()
:
df['Chars'] = df['free text'].str.len() df['Words'] = df['free text'].str.count(' ')+1
Sample dataframe used:
d = {'free text': ["merry had a little lamb", "Little Jonathan found a chicken",np.nan]} df = pd.DataFrame(data=d)
OR
via numpy but you will get 0 count when there are NaN’s present:
df['Chars'] =np.char.count(df['free text'].to_numpy(na_value='').astype(str),' ') df['Words'] =np.char.str_len(df['free text'].to_numpy(na_value='').astype(str))
output of df
:
free text Chars Words 0 merry had a little lamb 23.0 5.0 1 Little Jonathan found a chicken 31.0 5.0 2 NaN NaN NaN