Skip to content
Advertisement

Matching Two Pandas DataFrames based on values in columns

I’m trying to match job candidates to mentors based on different several variables that would hopefully create a good match. There are two Pandas DataFrames (one for candidates and one for mentors) that I’m trying to connect based on experience, location, desired job, etc.

For example I have a mentor DataFrame that might look something like the below:

mentor_df = pd.DataFrame({
      "Mentor":["Bob", "Kate", "Joe", "Mark"],
      "Experience":[3, 4, 5, 4],
      "Location": ["US", "FR", "JP", "US"],
      "Industry": ["Tech", "Tech", "Media", "Medicine"]
    })
    Mentor  Experience Location Industry
0  Bob     3         US       Tech
1  Kate    4         FR       Tech
2  Joe     5         JP       Media
2  Mark    4         US       Medicine

Along with a corresponding candidate DataFrame that looks like the below:

candidate_df = pd.DataFrame({
      "Candidate":["Candidate 1", "Candidate 2", "Candidate 3"],
      "Experience":[4, 4, 5],
      "Location": ["US", "FR", "JP", "US"],
      "Industry": ["Tech", "Media", "Medicine"]
    })
   Candidate      Experience Location Industry
0  Candidate 1    4         US        Tech
1  Canidate  2    4         FR        Media
2  Canidate  3    5         JP        Medicine

The resulting DataFrame connecting the candidates to mentors might look something like the below.

    Mentor  Experience Location Candidate    Industry
0  Bob     3         US       NaN          NaN
1  Kate    4         FR       Candidate 2  Tech
2  Joe     5         JP       Candidate 3  Media
2  Mark    4         US       Candidate 1  Medicine

Note that the DataFrame for the mentors will be larger than the candidates as every candidate should receive.

Any thoughts on how to best approaches this problem will be greatly appreciated :)

P.S. I know this might read like a homework problem, but the application that this is intended for is to help match separating service members and veterans find jobs by pairing them with mentors in career fields they’re interested in.

Advertisement

Answer

@Henry is on the right path. You’ll need to modify your candidate dataframe to a) make sure all arrays are the same length (or add NaNs if you don’t have them, and b) tweak a bit to make sure you actually have some matches.

I used your mentor_df, and the following candidate_df:

    candidate_df = pd.DataFrame({
          "Candidate":["Candidate 1", "Candidate 2", "Candidate 3", "Candidate 4"],
          "Experience":[4, 4, 5, 4],
          "Location": ["US", "FR", "JP", "US"],
          "Industry": ["Tech", "Media", "Medicine", "Medicine"]
        })

Then the merge works fine:

merged = mentor_df.merge(candidate_df, how='left')

Output:

 Mentor  Experience Location  Industry    Candidate
0    Bob           3       US      Tech          NaN
1   Kate           4       FR      Tech          NaN
2    Joe           5       JP     Media          NaN
3   Mark           4       US  Medicine  Candidate 4

Note you need to get to the last row before you have both a candidate and a mentor since this is matching on experience, location, and industry, and unless all three match you get a NaN either in candidate or mentor.

Good luck!

User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement