How to optimize the data frame memory footprint and find the most optimal (minimal) data types dtypes
for numeric columns. For example:
JavaScript
x
10
10
1
A B C D
2
0 1 1000000 1.1 1.111111
3
1 2 -1000000 2.1 2.111111
4
5
>>> df.dtypes
6
A int64
7
B int64
8
C float64
9
D float64
10
Expected result:
JavaScript
1
7
1
>>> df.dtypes
2
A int8
3
B int32
4
C float32
5
D float32
6
dtype: object
7
Advertisement
Answer
You can use parameter downcast
in to_numeric
with selectig integers and floats columns by DataFrame.select_dtypes
, it working from pandas 0.19+
like mentioned @anurag, thank you:
JavaScript
1
13
13
1
fcols = df.select_dtypes('float').columns
2
icols = df.select_dtypes('integer').columns
3
4
df[fcols] = df[fcols].apply(pd.to_numeric, downcast='float')
5
df[icols] = df[icols].apply(pd.to_numeric, downcast='integer')
6
7
print (df.dtypes)
8
A int8
9
B int32
10
C float32
11
D float32
12
dtype: object
13