Skip to content
Advertisement

Convert np.nan to pd.NA

How can I convert np.nan into the new pd.NA format, given the pd.DataFrame comprises float?

import numpy as np
import pandas as pd

df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B'])
df.iloc[0, 1] = 1.5
df.iloc[3, 0] = 4.7

df = df.convert_dtypes()

type(df.iloc[0, 0])  # numpy.float64 - I'am expecting pd.NA

Making use of pd.convert_dtypes() doesn’t seem to work when df comprises float. This conversion is however working fine when df contains int.

Advertisement

Answer

From v1.2 this now works with floats by default and if you want integer use convert_floating=False parameter.

import numpy as np
import pandas as pd

df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B'])
df.iloc[0, 1] = 1.5
df.iloc[3, 0] = 4.7

df = df.convert_dtypes()
df.info()

output

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   A       1 non-null      Float64
 1   B       1 non-null      Float64
dtypes: Float64(2)
memory usage: 104.0 bytes

Working with ints

import numpy as np
import pandas as pd

df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B'])
df.iloc[0, 1] = 1
df.iloc[3, 0] = 4

df = df.convert_dtypes(convert_floating=False)
df.info()

output

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4 entries, 0 to 3
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype
---  ------  --------------  -----
 0   A       1 non-null      Int64
 1   B       1 non-null      Int64
dtypes: Int64(2)
memory usage: 104.0 bytes
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement