I’m trying to match job candidates to mentors based on different several variables that would hopefully create a good match. There are two Pandas DataFrames (one for candidates and one for mentors) that I’m trying to connect based on experience, location, desired job, etc.
For example I have a mentor DataFrame that might look something like the below:
mentor_df = pd.DataFrame({ "Mentor":["Bob", "Kate", "Joe", "Mark"], "Experience":[3, 4, 5, 4], "Location": ["US", "FR", "JP", "US"], "Industry": ["Tech", "Tech", "Media", "Medicine"] }) Mentor Experience Location Industry 0 Bob 3 US Tech 1 Kate 4 FR Tech 2 Joe 5 JP Media 2 Mark 4 US Medicine
Along with a corresponding candidate DataFrame that looks like the below:
candidate_df = pd.DataFrame({ "Candidate":["Candidate 1", "Candidate 2", "Candidate 3"], "Experience":[4, 4, 5], "Location": ["US", "FR", "JP", "US"], "Industry": ["Tech", "Media", "Medicine"] }) Candidate Experience Location Industry 0 Candidate 1 4 US Tech 1 Canidate 2 4 FR Media 2 Canidate 3 5 JP Medicine
The resulting DataFrame connecting the candidates to mentors might look something like the below.
Mentor Experience Location Candidate Industry 0 Bob 3 US NaN NaN 1 Kate 4 FR Candidate 2 Tech 2 Joe 5 JP Candidate 3 Media 2 Mark 4 US Candidate 1 Medicine
Note that the DataFrame for the mentors will be larger than the candidates as every candidate should receive.
Any thoughts on how to best approaches this problem will be greatly appreciated :)
P.S. I know this might read like a homework problem, but the application that this is intended for is to help match separating service members and veterans find jobs by pairing them with mentors in career fields they’re interested in.
Advertisement
Answer
@Henry is on the right path. You’ll need to modify your candidate dataframe to a) make sure all arrays are the same length (or add NaN
s if you don’t have them, and b) tweak a bit to make sure you actually have some matches.
I used your mentor_df, and the following candidate_df:
candidate_df = pd.DataFrame({ "Candidate":["Candidate 1", "Candidate 2", "Candidate 3", "Candidate 4"], "Experience":[4, 4, 5, 4], "Location": ["US", "FR", "JP", "US"], "Industry": ["Tech", "Media", "Medicine", "Medicine"] })
Then the merge works fine:
merged = mentor_df.merge(candidate_df, how='left')
Output:
Mentor Experience Location Industry Candidate 0 Bob 3 US Tech NaN 1 Kate 4 FR Tech NaN 2 Joe 5 JP Media NaN 3 Mark 4 US Medicine Candidate 4
Note you need to get to the last row before you have both a candidate and a mentor since this is matching on experience, location, and industry, and unless all three match you get a NaN
either in candidate or mentor.
Good luck!