How can I convert np.nan
into the new pd.NA
format, given the pd.DataFrame
comprises float
?
import numpy as np import pandas as pd df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B']) df.iloc[0, 1] = 1.5 df.iloc[3, 0] = 4.7 df = df.convert_dtypes() type(df.iloc[0, 0]) # numpy.float64 - I'am expecting pd.NA
Making use of pd.convert_dtypes()
doesn’t seem to work when df
comprises float
. This conversion is however working fine when df
contains int
.
Advertisement
Answer
From v1.2 this now works with floats by default and if you want integer use convert_floating=False
parameter.
import numpy as np import pandas as pd df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B']) df.iloc[0, 1] = 1.5 df.iloc[3, 0] = 4.7 df = df.convert_dtypes() df.info()
output
<class 'pandas.core.frame.DataFrame'> Int64Index: 4 entries, 0 to 3 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 A 1 non-null Float64 1 B 1 non-null Float64 dtypes: Float64(2) memory usage: 104.0 bytes
Working with ints
import numpy as np import pandas as pd df = pd.DataFrame(np.nan, index=[0, 1, 2, 3], columns=['A', 'B']) df.iloc[0, 1] = 1 df.iloc[3, 0] = 4 df = df.convert_dtypes(convert_floating=False) df.info()
output
<class 'pandas.core.frame.DataFrame'> Int64Index: 4 entries, 0 to 3 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 A 1 non-null Int64 1 B 1 non-null Int64 dtypes: Int64(2) memory usage: 104.0 bytes