I am trying to combine two data sets. Using codes as follows
pd1=pd.read_csv('path1') # 1456472 rows x 17 columns pd2=pd.read_csv('path2') # 1083899 rows x 42 columns pd=pd.merge(left=pd1,right=pd2,how='left',on='id')
It returns with error:
MemoryError: Unable to allocate 10.8 TiB for an array with shape (1483050607760,) and data type int64
How can I solve this if my laptop is a 500GB+8GB one? Thank you in advance.
Advertisement
Answer
try dask then you can convert it to pandas if you want to on other machine
import dask.dataframe as dd #pip install "dask[dataframe]" dd1=dd.read_csv('path1') dd2=dd.read_csv('path2') dd=dd.merge(left=dd1,right=dd2,how='left',on='id')