Consider this Dataframe:
JavaScript
x
4
1
df = pd.DataFrame({'A': [1, 1, 2, 2, 3, 3],
2
'B': [10, 15, 20, 25, 30,35],
3
'C': [100, 150, 200, 250, 300, 350]})
4
This is the code to get values of column C, where it is the first row of each group (Column A):
JavaScript
1
2
1
firsts = df.groupby('A').first()['C']
2
So first will be: (100, 200, 300)
.
Now I want to add new column which it will be 1
if value of column C for row is in firsts
otherwise it will be 0
.
A | B | C | D |
---|---|---|---|
1 | 10 | 100 | 1 |
1 | 15 | 150 | 0 |
2 | 20 | 200 | 1 |
2 | 25 | 250 | 0 |
3 | 30 | 300 | 1 |
3 | 35 | 350 | 0 |
I used this:
JavaScript
1
2
1
df['D'] = df['C'].apply(lambda x: 1 if x in firsts else 0)
2
But the output is:
A | B | C | D |
---|---|---|---|
1 | 10 | 100 | 0 |
1 | 15 | 150 | 0 |
2 | 20 | 200 | 0 |
2 | 25 | 250 | 0 |
3 | 30 | 300 | 0 |
3 | 35 | 350 | 0 |
I appreciate if anyone explain why my solution is wrong and what is actual solution to this problem?
Advertisement
Answer
You can use isin
method:
JavaScript
1
11
11
1
df['D'] = df.C.isin(firsts).astype(int)
2
3
df
4
# A B C D
5
#0 1 10 100 1
6
#1 1 15 150 0
7
#2 2 20 200 1
8
#3 2 25 250 0
9
#4 3 30 300 1
10
#5 3 35 350 0
11
The reason your approach fails is that python in
operator check the index of a Series instead of the values, the same as how a dictionary works:
JavaScript
1
19
19
1
firsts
2
#A
3
#1 100
4
#2 200
5
#3 300
6
#Name: C, dtype: int64
7
8
1 in firsts
9
# True
10
11
100 in firsts
12
# False
13
14
2 in firsts
15
# True
16
17
200 in firsts
18
# False
19
Modifying your method as follows works:
JavaScript
1
11
11
1
firstSet = set(firsts)
2
df['C'].apply(lambda x: 1 if x in firstSet else 0)
3
4
#0 1
5
#1 0
6
#2 1
7
#3 0
8
#4 1
9
#5 0
10
#Name: C, dtype: int64
11