Skip to content
Advertisement

How to downcast numeric columns in Pandas?

How to optimize the data frame memory footprint and find the most optimal (minimal) data types dtypes for numeric columns. For example:

   A        B    C         D
0  1  1000000  1.1  1.111111
1  2 -1000000  2.1  2.111111

>>> df.dtypes
A      int64
B      int64
C    float64
D    float64

Expected result:

>>> df.dtypes
A       int8
B      int32
C    float32
D    float32
dtype: object

Advertisement

Answer

You can use parameter downcast in to_numeric with selectig integers and floats columns by DataFrame.select_dtypes, it working from pandas 0.19+ like mentioned @anurag, thank you:

fcols = df.select_dtypes('float').columns
icols = df.select_dtypes('integer').columns

df[fcols] = df[fcols].apply(pd.to_numeric, downcast='float')
df[icols] = df[icols].apply(pd.to_numeric, downcast='integer')

print (df.dtypes)
A       int8
B      int32
C    float32
D    float32
dtype: object
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement