Skip to content
Advertisement

Pandas left join in place

I have a large data frame df and a small data frame df_right with 2 columns a and b. I want to do a simple left join / lookup on a without copying df.

I come up with this code but I am not sure how robust it is:

JavaScript

I know it certainly fails when there are duplicated keys: pandas left join – why more results?

Is there better way to do this?

Related:

Outer merging two data frames in place in pandas

What are the exact downsides of copy=False in DataFrame.merge()?

Advertisement

Answer

You are almost there. There are 4 cases to consider:

  1. Both df and df_right do not have duplicated keys
  2. Only df has duplicated keys
  3. Only df_right has duplicated keys
  4. Both df and df_right have duplicated keys

Your code fails in case 3 & 4 since the merging extends the number of row count in df. In order to make it work, you need to choose what information to drop in df_right prior to merging. The purpose of this is to enforce any merging scheme to be either case 1 or 2.

For example, if you wish to keep “first” values for each duplicated key in df_right, the following code works for all 4 cases above.

JavaScript

Alternatively, if column 'b' of df_right consists of numeric values and you wish to have summary statistic:

JavaScript
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement