I have a pandas data frame with some NaN values which I have replaced by
JavaScript
x
2
1
""
2
Now one of my functions does the following:
JavaScript
1
4
1
for word in row['TEXT'].split():
2
sum_prob += math.log(((dict_list[i].get(word,0)+10 )/(total_dict.get(word,0)+90)))
3
text_feature_responseCoding[row_index][i] = math.exp(sum_prob/len(row['TEXT'].split()))
4
Since I have replaced the NaN value by “”, I am getting
JavaScript
1
2
1
division by zero error
2
What should be the apt way to fill the NaN values so that I can get rid of this particular error?
Advertisement
Answer
You could filter your df to have only rows where ‘TEXT’ is not null, not an empty string, etc., then iterate through that filtered df. See these toy examples for some of the ways to filter the df:
JavaScript
1
15
15
1
df = pd.DataFrame({'TEXT': ['hello there', 'python fun', np.nan, '']})
2
3
# only non-null rows
4
df_filtered1 = df[df['TEXT'].notnull()]
5
6
# only rows with text other than ''
7
df_filtered2 = df[df['TEXT'] != '']
8
9
# non null and text not merely ''
10
df_filtered3 = df[(df['TEXT'].notnull()) & (df['TEXT'] != '')]
11
12
# non null, and require a letter (you may be fine with only digits, but if not, this may be useful)
13
df_filtered4 = df[(df['TEXT'].notnull()) & (df['TEXT'].str.contains('[A-z]'))]
14
15