Skip to content
Advertisement

pandas join tables on two columns without ordering of values

I would like to achieve what it’s described here: stackoverflow question, but only using standard pandas.

I have two dataframes: Fist

JavaScript

Second:

JavaScript

I want to join the two dataframes such that my final dataframe is identical to the first one, but it has also the book_count column with the corresponding values (and NaN if not available).

I already wrote something like: joined_df = first_df.merge(second_df, on = ['first_employee', 'target_employee'], how = 'outer') and I get:

JavaScript

And it is somewhat close to what I would like to achieve. However, the ordering of the values in the first_employee and target_employee it’s not relevant, so if in the first dataframe I have (Frida,Vincent) and in the second (Vincent, Frida), these twos should be merged together (what matters are the values, not the column-wise order).

In my resulting dataframe i get three extra rows:

JavaScript

which are the result of my merging that considers “ordered” values columns-wise to make the join: these 3 extra rows should be merged on the already available couples (Frida, Vincent) (Pablo, Vincent) and (Frida, Pablo).

Is there a way to do so using only standard pandas functions? (the question I cited at the beginning uses sqldf)

Advertisement

Answer

I believe this is what you are looking for. Using np.sort will change the order of the first two columns for each row so it is alphabetical, allowing the merge to work correctly.

JavaScript
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement