Skip to content
Advertisement

Compare two side by side column in pandas dataframe and colour selected cells simultaneously

I have a DataFrame say – df

data = [['00898','22123'],['87','99878'],['22123','66454'],['00898','87'],
        ['2109','312'],['6412','3078'],['6412','3078'],['66454','03158'],['3268','87']]
df = pd.DataFrame(data,columns = ['emp_ids','master_emp_ids'])
df
emp_ids    master_emp_ids
0   00898   22123
1   87      99878
2   22123   66454
3   00898   87
4   2109    312
5   6412    3078
6   6412    3078
7   66454   03158
8   3268    87

Now I want to check whether all the id’s of the column emp_ids are present in the the column master_emp_ids or not. So each and every id of emp_ids should get compared with each and every id of master_emp_ids. And there should be a column created named status, where it gives yes if present and no if not. Basically, 87 in the first col is present in the second col so it’s a yes and also to color green that particular cell where it says ‘yes’. So my final DataFrame should be – df

data = [['no','00898','22123'],['yes','87','99878'],['yes','22123','66454'],['no','00898','87'],
        ['no','2109','312'],['no','6412','3078'],['no','6412','3078'],['yes','66454','03158'],['no','3268','87']]
df = pd.DataFrame(data,columns = ['Status','emp_ids','master_emp_ids'])
df
status  emp_ids   master_emp_ids
0   no  00898     22123
1   yes 87        99878
2   yes 22123     66454
3   no  00898     87
4   no  2109      312
5   no  6412      3078
6   no  6412      3078
7   yes 66454     03158
8   no  3268      87

please help me to get my desired OP.

Thanks!!!

Advertisement

Answer

Use numpy.where with Series.isin for add new last column:

df['status'] = np.where(df['emp_ids'].isin(df['master_emp_ids']), 'yes','no')
print (df)
  Status emp_ids master_emp_ids status
0     no   00898          22123     no
1    yes      87          99878    yes
2    yes   22123          66454    yes
3     no   00898             87     no
4     no    2109            312     no
5     no    6412           3078     no
6     no    6412           3078     no
7    yes   66454          03158    yes
8     no    3268             87     no

If need first column use DataFrame.insert:

df.insert(0, 'status',  np.where(df['emp_ids'].isin(df['master_emp_ids']), 'yes','no'))
print (df)
  status Status emp_ids master_emp_ids
0     no     no   00898          22123
1    yes    yes      87          99878
2    yes    yes   22123          66454
3     no     no   00898             87
4     no     no    2109            312
5     no     no    6412           3078
6     no     no    6412           3078
7    yes    yes   66454          03158
8     no     no    3268             87

EDIT: For coloring is possible use style:

def color(val):
    if val == 'yes':
        return 'background-color: green'
    elif val == 'no':
        return 'background-color: red'
    else:
        return ''

    
df.insert(0, 'status',  np.where(df['emp_ids'].isin(df['master_emp_ids']), 'yes','no'))

df.style.applymap(color).to_excel('styled.xlsx', engine='openpyxl')
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement