Skip to content
Advertisement

Python MemoryError: Unable to allocate 10.8 TiB for an array with shape () and data type int64

I am trying to combine two data sets. Using codes as follows

pd1=pd.read_csv('path1')  # 1456472 rows x 17 columns
pd2=pd.read_csv('path2')  # 1083899 rows x 42 columns
pd=pd.merge(left=pd1,right=pd2,how='left',on='id')

It returns with error:

MemoryError: Unable to allocate 10.8 TiB for an array with shape (1483050607760,) and data type int64

How can I solve this if my laptop is a 500GB+8GB one? Thank you in advance.

Advertisement

Answer

try dask then you can convert it to pandas if you want to on other machine

import dask.dataframe as dd

#pip install "dask[dataframe]"

dd1=dd.read_csv('path1') 
dd2=dd.read_csv('path2')  
dd=dd.merge(left=dd1,right=dd2,how='left',on='id')
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement