Skip to content
Advertisement

Using lamda to compare two columns

My dataframe is like this: df = pd.DataFrame({'A': [1,2,3], 'B': [1,4,5]})

If column A has the same value as column B, output 1, else 0.

I want to output like this:

    A   B   is_equal
0   1   1   1
1   2   4   0
2   3   5   0

I figured out df['is_equal'] = np.where((df['A'] == df['B']), 1, 0) worked fine.

But I want to use lambda here because I used a similar line in another case before. df['is_equals'] = df.apply(lambda x: 1 if df['A']==1 else 0, axis=1) won’t work. It threw the error ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Why did this error happen and how can I fix the code.

Thank you in advance.

Advertisement

Answer

What you attempt to do is very inefficient. Don not do it. .apply should not be used when other solutions are possible. The best solution is:

df['is_equal'] = (df['A'] == df['B']).astype(int)

But if you insist:

df.apply(lambda row: int(row['A'] == row['B']), axis=1)

The latter answer is 2,5 times slower. The original np.where is the fastest.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement