I have a dataframe that contains data of employees, their managers and the projects they worked on. The dataframe (a bit simplified) looks like this: I would like get all employees that have both worked with manager 17 and 18, in this case that would be employee 2 and employee 6. I know I can write a query to get

Find all ids that have 2 specific values for a one column

I have a dataframe that contains data of employees, their managers and the projects they worked on. The dataframe (a bit simplified) looks like this:

    EmployeeID  ManagerID  ProjectID
0            2         18        111
1            3         17        111
2            2         17        119
3            3         22        121
4            6         22        121
5            6         18        111
6            6         17        113
7            6         17        116

JavaScript
​x
 
    EmployeeID  ManagerID  ProjectID
0            2         18        111
1            3         17        111
2            2         17        119
3            3         22        121
4            6         22        121
5            6         18        111
6            6         17        113
7            6         17        116
​

I would like get all employees that have both worked with manager 17 and 18, in this case that would be employee 2 and employee 6.

I know I can write a query to get all employees that worked with either manager 17 or 18 using:

df.query('ManagerID == 17 | ManagerID == 18')

JavaScript
 
df.query('ManagerID == 17 | ManagerID == 18')
​

But now I would need to find all employees that have worked with bot, since the combination of a employee – manager can be found multiple times in the dataframe I can’t use a count. I think I would need an self join, but I don’t really know how that can be done in pandas.

Answer

You can use DataFrame.drop_duplicates with DataFrame.pivot and DataFrame.dropna for all EmployeeID exist for both managers:

df = df.query('ManagerID == 17 | ManagerID == 18')
#another solution for filter 
#df = df.query('ManagerID in [17, 18]')
emp = (df.drop_duplicates(subset=['EmployeeID','ManagerID'])
        .pivot('EmployeeID','ManagerID','ProjectID')
        .dropna()
        .index
        .tolist())
print (emp)
[2, 6]

JavaScript
 
df = df.query('ManagerID == 17 | ManagerID == 18')
#another solution for filter 
#df = df.query('ManagerID in [17, 18]')
emp = (df.drop_duplicates(subset=['EmployeeID','ManagerID'])
        .pivot('EmployeeID','ManagerID','ProjectID')
        .dropna()
        .index
        .tolist())
print (emp)
[2, 6]
​

Advertisement

Answer