Skip to content
Advertisement

Use None instead of np.nan for null values in pandas DataFrame

I have a pandas DataFrame with mixed data types. I would like to replace all null values with None (instead of default np.nan). For some reason, this appears to be nearly impossible.

In reality my DataFrame is read in from a csv, but here is a simple DataFrame with mixed data types to illustrate my problem.

df = pd.DataFrame(index=[0], columns=range(5))
df.iloc[0] = [1, 'two', np.nan, 3, 4] 

I can’t do:

>>> df.fillna(None)
ValueError: must specify a fill method or value

nor:

>>> df[df.isnull()] = None
TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value

nor:

>>> df.replace(np.nan, None)
TypeError: cannot replace [nan] with method pad on a DataFrame

I used to have a DataFrame with only string values, so I could do:

>>> df[df == ""] = None

which worked. But now that I have mixed datatypes, it’s a no go.

For various reasons about my code, it would be helpful to be able to use None as my null value. Is there a way I can set the null values to None? Or do I just have to go back through my other code and make sure I’m using np.isnan or pd.isnull everywhere?

Advertisement

Answer

Use pd.DataFrame.where
Uses df value when condition is met, otherwise uses None

df.where(df.notnull(), None)

enter image description here

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement