Skip to content
Advertisement

Modifying data frame containing NaN value so that I don’t get not a number error on division

I have a pandas data frame with some NaN values which I have replaced by

""

Now one of my functions does the following:

            for word in row['TEXT'].split():
                sum_prob += math.log(((dict_list[i].get(word,0)+10 )/(total_dict.get(word,0)+90)))
            text_feature_responseCoding[row_index][i] = math.exp(sum_prob/len(row['TEXT'].split()))

Since I have replaced the NaN value by “”, I am getting

division by zero error

What should be the apt way to fill the NaN values so that I can get rid of this particular error?

Advertisement

Answer

You could filter your df to have only rows where ‘TEXT’ is not null, not an empty string, etc., then iterate through that filtered df. See these toy examples for some of the ways to filter the df:

df = pd.DataFrame({'TEXT': ['hello there', 'python fun', np.nan, '']})

# only non-null rows
df_filtered1 = df[df['TEXT'].notnull()]

# only rows with text other than ''
df_filtered2 = df[df['TEXT'] != '']

# non null and text not merely ''
df_filtered3 = df[(df['TEXT'].notnull()) & (df['TEXT'] != '')]

# non null, and require a letter (you may be fine with only digits, but if not, this may be useful)
df_filtered4 = df[(df['TEXT'].notnull()) & (df['TEXT'].str.contains('[A-z]'))]

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement