How to optimize the data frame memory footprint and find the most optimal (minimal) data types dtypes for numeric columns. For example:
A B C D 0 1 1000000 1.1 1.111111 1 2 -1000000 2.1 2.111111 >>> df.dtypes A int64 B int64 C float64 D float64
Expected result:
>>> df.dtypes A int8 B int32 C float32 D float32 dtype: object
Advertisement
Answer
You can use parameter downcast in to_numeric with selectig integers and floats columns by DataFrame.select_dtypes, it working from pandas 0.19+ like mentioned @anurag, thank you:
fcols = df.select_dtypes('float').columns
icols = df.select_dtypes('integer').columns
df[fcols] = df[fcols].apply(pd.to_numeric, downcast='float')
df[icols] = df[icols].apply(pd.to_numeric, downcast='integer')
print (df.dtypes)
A       int8
B      int32
C    float32
D    float32
dtype: object
