I have a pandas data frame with some NaN values which I have replaced by
""
Now one of my functions does the following:
for word in row['TEXT'].split(): sum_prob += math.log(((dict_list[i].get(word,0)+10 )/(total_dict.get(word,0)+90))) text_feature_responseCoding[row_index][i] = math.exp(sum_prob/len(row['TEXT'].split()))
Since I have replaced the NaN value by “”, I am getting
division by zero error
What should be the apt way to fill the NaN values so that I can get rid of this particular error?
Advertisement
Answer
You could filter your df to have only rows where ‘TEXT’ is not null, not an empty string, etc., then iterate through that filtered df. See these toy examples for some of the ways to filter the df:
df = pd.DataFrame({'TEXT': ['hello there', 'python fun', np.nan, '']}) # only non-null rows df_filtered1 = df[df['TEXT'].notnull()] # only rows with text other than '' df_filtered2 = df[df['TEXT'] != ''] # non null and text not merely '' df_filtered3 = df[(df['TEXT'].notnull()) & (df['TEXT'] != '')] # non null, and require a letter (you may be fine with only digits, but if not, this may be useful) df_filtered4 = df[(df['TEXT'].notnull()) & (df['TEXT'].str.contains('[A-z]'))]