I have a dataframe that contains data of employees, their managers and the projects they worked on. The dataframe (a bit simplified) looks like this:
EmployeeID ManagerID ProjectID
0 2 18 111
1 3 17 111
2 2 17 119
3 3 22 121
4 6 22 121
5 6 18 111
6 6 17 113
7 6 17 116
I would like get all employees that have both worked with manager 17 and 18, in this case that would be employee 2 and employee 6.
I know I can write a query to get all employees that worked with either manager 17 or 18 using:
df.query('ManagerID == 17 | ManagerID == 18')
But now I would need to find all employees that have worked with bot, since the combination of a employee – manager can be found multiple times in the dataframe I can’t use a count. I think I would need an self join, but I don’t really know how that can be done in pandas.
Advertisement
Answer
You can use DataFrame.drop_duplicates
with DataFrame.pivot
and DataFrame.dropna
for all EmployeeID
exist for both managers:
df = df.query('ManagerID == 17 | ManagerID == 18')
#another solution for filter
#df = df.query('ManagerID in [17, 18]')
emp = (df.drop_duplicates(subset=['EmployeeID','ManagerID'])
.pivot('EmployeeID','ManagerID','ProjectID')
.dropna()
.index
.tolist())
print (emp)
[2, 6]