I have a dataframe that contains data of employees, their managers and the projects they worked on. The dataframe (a bit simplified) looks like this:
EmployeeID ManagerID ProjectID 0 2 18 111 1 3 17 111 2 2 17 119 3 3 22 121 4 6 22 121 5 6 18 111 6 6 17 113 7 6 17 116
I would like get all employees that have both worked with manager 17 and 18, in this case that would be employee 2 and employee 6.
I know I can write a query to get all employees that worked with either manager 17 or 18 using:
df.query('ManagerID == 17 | ManagerID == 18')
But now I would need to find all employees that have worked with bot, since the combination of a employee – manager can be found multiple times in the dataframe I can’t use a count. I think I would need an self join, but I don’t really know how that can be done in pandas.
Advertisement
Answer
You can use DataFrame.drop_duplicates
with DataFrame.pivot
and DataFrame.dropna
for all EmployeeID
exist for both managers:
df = df.query('ManagerID == 17 | ManagerID == 18') #another solution for filter #df = df.query('ManagerID in [17, 18]') emp = (df.drop_duplicates(subset=['EmployeeID','ManagerID']) .pivot('EmployeeID','ManagerID','ProjectID') .dropna() .index .tolist()) print (emp) [2, 6]