Skip to content
Advertisement

Dask concatenate 2 dataframes into 1 single dataframe

Objective

To merge df_labelled file with a portion of labelled points to df where contains all the point.

What I have tried

Referring to Simple way to Dask concatenate (horizontal, axis=1, columns), I tried the code below

JavaScript

But I get the error

ValueError: Not all divisions are known, can’t align partitions. Please use set_index to set the index.

Another thing I have tried is to do left join of the table, but I got NaN for all label, can you explain what I did wrong?

JavaScript

Is there anyway I can achieve the expected result as below? I can’t run in Pandas because there are a lot of points which will cause memory issue in Pandas.

Data

df (This file has all points)

JavaScript

df_labelled (This file contains a portion of labelled points)

JavaScript

Expected outcome

JavaScript

Advertisement

Answer

I think when you do something like this then error:

JavaScript

because there is no index in dataframe df or/and df_labelled. And Dask doesn’t support multiple index as Pandas. Instead of using index, define the left key and right key if you have more than one key to merge dataframe in Dask. This one is works for me :

JavaScript
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement