I have a pandas DataFrame with mixed data types. I would like to replace all null values with None (instead of default np.nan). For some reason, this appears to be nearly impossible.
In reality my DataFrame is read in from a csv, but here is a simple DataFrame with mixed data types to illustrate my problem.
df = pd.DataFrame(index=[0], columns=range(5)) df.iloc[0] = [1, 'two', np.nan, 3, 4]
I can’t do:
>>> df.fillna(None) ValueError: must specify a fill method or value
nor:
>>> df[df.isnull()] = None TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value
nor:
>>> df.replace(np.nan, None) TypeError: cannot replace [nan] with method pad on a DataFrame
I used to have a DataFrame with only string values, so I could do:
>>> df[df == ""] = None
which worked. But now that I have mixed datatypes, it’s a no go.
For various reasons about my code, it would be helpful to be able to use None as my null value. Is there a way I can set the null values to None? Or do I just have to go back through my other code and make sure I’m using np.isnan or pd.isnull everywhere?
Advertisement
Answer
Use pd.DataFrame.where
Uses df
value when condition is met, otherwise uses None
df.where(df.notnull(), None)