How to calculate with conditions in pandas?

Question

I have a dataframe like this, I want to calculate and add a new column which follows the formula: Value = A(where Time=1) + A(where Time=3), I don't want to use A (where Time=5). I know how to do by selecting the cell needed for the formula, but is there any other better ways to perform the calculation? I suspect

Accepted Answer

Use Series.eq with DataFrame.groupby and Series.cumsum to create groups and add.c1 = df.Time.eq(1)c3 = df.Time.eq(3)df['Value'] = (df.loc[c1|c3]                 .groupby(c1.cumsum())                 .A                 .transform('sum')                 .loc[c1])print(df)or if you want to identify it based on the non-equivalence with 5:c = df['Time'].eq(5)df['value'] = (df['A'].mask(c)                     .groupby(c.cumsum())                     .transform('sum')                     .where(c.shift(fill_value = True))              ) #Another option is map c = df['Time'].eq(5) c_cumsum = c.cumsum() df['value'] = (c_cumsum.map(df['A'].mask(c)                       .groupby(c_cumsum)                       .sum())                       .where(c.shift(fill_value = True)))Output   Type subType  Time  A  Value0     X       a     1  3   12.01     X       a     3  9    NaN2     X       a     5  9    NaN3     X       b     1  4    9.04     X       b     3  5    NaN5     X       b     5  0    NaN6     Y       a     1  1    3.07     Y       a     3  2    NaN8     Y       a     5  3    NaN9     Y       b     1  4    9.010    Y       b     3  5    NaN11    Y       b     5  2    NaNMISSING VALUESc = df['Time'].eq(5)df['value'] = (df['A'].mask(c)                     .groupby(c.cumsum())                     .transform('sum')              )#or method 1#c1 = df.Time.eq(1)#c3 = df.Time.eq(3)#df['Value'] = (df.loc[c1|c3]#                 .groupby(c1.cumsum())#                 .A#                 .transform('sum')#               )print(df)Output   Type subType  Time  A  value0     X       a     1  3   12.01     X       a     3  9   12.02     X       a     5  9    9.03     X       b     1  4    9.04     X       b     3  5    9.05     X       b     5  0    3.06     Y       a     1  1    3.07     Y       a     3  2    3.08     Y       a     5  3    9.09     Y       b     1  4    9.010    Y       b     3  5    9.011    Y       b     5  2    0.0or filling all except where Time is 5c = df['Time'].eq(5)df['value'] = (df['A'].mask(c)                     .groupby(c.cumsum())                     .transform('sum').mask(c))#c1 = df.Time.eq(1)#c3 = df.Time.eq(3)#or method 1#df['Value'] = (df.loc[c1|c3]#                 .groupby(c1.cumsum())#                 .A#                 .transform('sum')#                 .loc[c1|c3])print(df)   Type subType  Time  A  value0     X       a     1  3   12.01     X       a     3  9   12.02     X       a     5  9    NaN3     X       b     1  4    9.04     X       b     3  5    9.05     X       b     5  0    NaN6     Y       a     1  1    3.07     Y       a     3  2    3.08     Y       a     5  3    NaN9     Y       b     1  4    9.010    Y       b     3  5    9.011    Y       b     5  2    NaNWhy not use apply here?Even in a small data frame it is already slower%%timeit(    df.groupby(by=['Type','subType'])    .apply(lambda x: x.loc[x.Time!=5].A.sum()) # sum time each group exclu    .to_frame('Value').reset_index()    .pipe(lambda x: pd.merge(df, x, on=['Type', 'subType'], how='left')))13.6 ms ± 2.67 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)%%timeitc = df['Time'].eq(5)df['value'] = (df['A'].mask(c)                     .groupby(c.cumsum())                     .transform('sum')                     .where(c.shift(fill_value = True))              )3.67 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Advertisement

Answer