Skip to content
Advertisement

Comparison of a Dataframe column values with a list

Consider this Dataframe:

df = pd.DataFrame({'A': [1, 1, 2, 2, 3, 3],
                   'B': [10, 15, 20, 25, 30,35],
                   'C': [100, 150, 200, 250, 300, 350]})

This is the code to get values of column C, where it is the first row of each group (Column A):

firsts = df.groupby('A').first()['C']

So first will be: (100, 200, 300).

Now I want to add new column which it will be 1 if value of column C for row is in firsts otherwise it will be 0.

A B C D
1 10 100 1
1 15 150 0
2 20 200 1
2 25 250 0
3 30 300 1
3 35 350 0

I used this:

df['D'] = df['C'].apply(lambda x: 1 if x in firsts else 0)

But the output is:

A B C D
1 10 100 0
1 15 150 0
2 20 200 0
2 25 250 0
3 30 300 0
3 35 350 0

I appreciate if anyone explain why my solution is wrong and what is actual solution to this problem?

Advertisement

Answer

You can use isin method:

df['D'] = df.C.isin(firsts).astype(int)

df
#   A   B   C   D
#0  1   10  100 1
#1  1   15  150 0
#2  2   20  200 1
#3  2   25  250 0
#4  3   30  300 1
#5  3   35  350 0

The reason your approach fails is that python in operator check the index of a Series instead of the values, the same as how a dictionary works:

firsts
#A
#1    100
#2    200
#3    300
#Name: C, dtype: int64

1 in firsts
# True

100 in firsts
# False

2 in firsts
# True

200 in firsts
# False

Modifying your method as follows works:

firstSet = set(firsts)
df['C'].apply(lambda x: 1 if x in firstSet else 0)

#0    1
#1    0
#2    1
#3    0
#4    1
#5    0
#Name: C, dtype: int64
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement