Given the following dataframe df
:
JavaScript
x
8
1
df = pd.DataFrame({'A':['Tony', 'Mike', 'Jen', 'Anna'], 'B': ['no', 'yes', 'no', 'yes']})
2
3
A B
4
0 Tony no
5
1 Mike yes
6
2 Jen no
7
3 Anna yes
8
I want to add another column that counts, progressively, the elements with df['B']='yes'
:
JavaScript
1
6
1
A B C
2
0 Tony no 0
3
1 Mike yes 1
4
2 Jen no 0
5
3 Anna yes 2
6
How can I do this?
Advertisement
Answer
You can use numpy.where
with cumsum
of boolean mask:
JavaScript
1
3
1
m = df['B']=='yes'
2
df['C'] = np.where(m, m.cumsum(), 0)
3
Another solution is count
boolean mask created by filtering and then add 0
values by reindex
:
JavaScript
1
9
1
m = df['B']=='yes'
2
df['C'] = m[m].cumsum().reindex(df.index, fill_value=0)
3
print (df)
4
A B C
5
0 Tony no 0
6
1 Mike yes 1
7
2 Jen no 0
8
3 Anna yes 2
9
Performance (in real data should be different, best check it first):
JavaScript
1
24
24
1
np.random.seed(123)
2
N = 10000
3
L = ['yes','no']
4
df = pd.DataFrame({'B': np.random.choice(L, N)})
5
print (df)
6
7
In [150]: %%timeit
8
m = df['B']=='yes' :
9
df['C'] = np.where(m, m.cumsum(), 0) :
10
:
11
1.57 ms ± 34.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
12
13
In [151]: %%timeit
14
m = df['B']=='yes' :
15
df['C'] = m[m].cumsum().reindex(df.index, fill_value=0) :
16
:
17
2.53 ms ± 54.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
18
19
In [152]: %%timeit
20
df['C'] = df.groupby('B').cumcount() + 1 :
21
df['C'].where(df['B'] == 'yes', 0, inplace=True) :
22
23
4.49 ms ± 27.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
24