How can I convert np.nan into the new pd.NA format, given the pd.DataFrame comprises float?
import numpy as np import pandas as pd df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B']) df.iloc[0, 1] = 1.5 df.iloc[3, 0] = 4.7 df = df.convert_dtypes() type(df.iloc[0, 0]) # numpy.float64 - I'am expecting pd.NA
Making use of pd.convert_dtypes() doesn’t seem to work when df comprises float. This conversion is however working fine when df contains int.
Advertisement
Answer
From v1.2 this now works with floats by default and if you want integer use convert_floating=False parameter.
import numpy as np import pandas as pd df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B']) df.iloc[0, 1] = 1.5 df.iloc[3, 0] = 4.7 df = df.convert_dtypes() df.info()
output
<class 'pandas.core.frame.DataFrame'> Int64Index: 4 entries, 0 to 3 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 A 1 non-null Float64 1 B 1 non-null Float64 dtypes: Float64(2) memory usage: 104.0 bytes
Working with ints
import numpy as np import pandas as pd df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B']) df.iloc[0, 1] = 1 df.iloc[3, 0] = 4 df = df.convert_dtypes(convert_floating=False) df.info()
output
<class 'pandas.core.frame.DataFrame'> Int64Index: 4 entries, 0 to 3 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 A 1 non-null Int64 1 B 1 non-null Int64 dtypes: Int64(2) memory usage: 104.0 bytes