Skip to content
Advertisement

compare 2 string columns in data frame and add 1 to key if different

I’m having a hard time trying to figure this out, I have a data frame with multiple columns after merging 2.

ID   name_x   name_y   age_x   age_y
1    Steve    Steve    40      40
2    John     John     34      35

I have 2 list:

list_a = [name_x, age_X]
list_b = [name_y, age_y]

I need to compare each group of variables and if they are different add 1 to the key column

ID   name_x   name_y   age_x   age_y  key
1    Steve    Steve    40      40     0
2    John     John     34      35     1

I was trying to use something like this:

for a in list_a:
    for b in list_b:
        master.loc[master[a] != master[b], 'key'] = +1 

Advertisement

Answer

I would avoid loops. Is there any reason we cant do it directly? Lets try

df['key'] =(df[list_a].values!=df[list_b].values).sum(1)



    ID name_x name_y  age_x  age_y  key
0   1  Steve  Steve     40     40    0
1   2   John   John     34     35    1
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement