Pandas: add column with progressive count of elements meeting a condition

Question

Given the following dataframe df: I want to add another column that counts, progressively, the elements with df[&#8216;B&#8217;]=&#8217;yes&#8217;: How can I do this? Answer You can use numpy.where with cumsum of boolean mask: Another solution is count boolean mask created by filtering and then add 0 values b…

Accepted Answer

You can use numpy.where with cumsum of boolean mask:m = df['B']=='yes'df['C'] = np.where(m, m.cumsum(), 0)Another solution is count boolean mask created by filtering and then add 0 values by reindex:m = df['B']=='yes'df['C'] = m[m].cumsum().reindex(df.index, fill_value=0)print (df)      A    B  C0  Tony   no  01  Mike  yes  12   Jen   no  03  Anna  yes  2Performance (in real data should be different, best check it first):np.random.seed(123)N = 10000L = ['yes','no']df = pd.DataFrame({'B': np.random.choice(L, N)})print (df)In [150]: %%timeit     ...: m = df['B']=='yes'     ...: df['C'] = np.where(m, m.cumsum(), 0)     ...: 1.57 ms ± 34.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)In [151]: %%timeit     ...: m = df['B']=='yes'     ...: df['C'] = m[m].cumsum().reindex(df.index, fill_value=0)     ...: 2.53 ms ± 54.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)In [152]: %%timeit     ...: df['C'] = df.groupby('B').cumcount() + 1     ...: df['C'].where(df['B'] == 'yes', 0, inplace=True)4.49 ms ± 27.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Advertisement

Answer