I am trying to combine two data sets. Using codes as follows
JavaScript
x
4
1
pd1=pd.read_csv('path1') # 1456472 rows x 17 columns
2
pd2=pd.read_csv('path2') # 1083899 rows x 42 columns
3
pd=pd.merge(left=pd1,right=pd2,how='left',on='id')
4
It returns with error:
MemoryError: Unable to allocate 10.8 TiB for an array with shape (1483050607760,) and data type int64
How can I solve this if my laptop is a 500GB+8GB one? Thank you in advance.
Advertisement
Answer
try dask then you can convert it to pandas if you want to on other machine
JavaScript
1
8
1
import dask.dataframe as dd
2
3
#pip install "dask[dataframe]"
4
5
dd1=dd.read_csv('path1')
6
dd2=dd.read_csv('path2')
7
dd=dd.merge(left=dd1,right=dd2,how='left',on='id')
8