How to optimize the data frame memory footprint and find the most optimal (minimal) data types dtypes
for numeric columns. For example:
A B C D 0 1 1000000 1.1 1.111111 1 2 -1000000 2.1 2.111111 >>> df.dtypes A int64 B int64 C float64 D float64
Expected result:
>>> df.dtypes A int8 B int32 C float32 D float32 dtype: object
Advertisement
Answer
You can use parameter downcast
in to_numeric
with selectig integers and floats columns by DataFrame.select_dtypes
, it working from pandas 0.19+
like mentioned @anurag, thank you:
fcols = df.select_dtypes('float').columns icols = df.select_dtypes('integer').columns df[fcols] = df[fcols].apply(pd.to_numeric, downcast='float') df[icols] = df[icols].apply(pd.to_numeric, downcast='integer') print (df.dtypes) A int8 B int32 C float32 D float32 dtype: object