I have a dataframe that contains data of employees, their managers and the projects they worked on. The dataframe (a bit simplified) looks like this:
EmployeeID ManagerID ProjectID 0 2 18 111 1 3 17 111 2 2 17 119 3 3 22 121 4 6 22 121 5 6 18 111 6 6 17 113 7 6 17 116
I would like get all employees that have both worked with manager 17 and 18, in this case that would be employee 2 and employee 6.
I know I can write a query to get all employees that worked with either manager 17 or 18 using:
df.query('ManagerID == 17 | ManagerID == 18')
But now I would need to find all employees that have worked with bot, since the combination of a employee – manager can be found multiple times in the dataframe I can’t use a count. I think I would need an self join, but I don’t really know how that can be done in pandas.
Advertisement
Answer
You can use DataFrame.drop_duplicates with DataFrame.pivot and DataFrame.dropna for all EmployeeID exist for both managers:
df = df.query('ManagerID == 17 | ManagerID == 18')
#another solution for filter
#df = df.query('ManagerID in [17, 18]')
emp = (df.drop_duplicates(subset=['EmployeeID','ManagerID'])
.pivot('EmployeeID','ManagerID','ProjectID')
.dropna()
.index
.tolist())
print (emp)
[2, 6]