How to calculate with conditions in pandas?

Question

I have a dataframe like this, I want to calculate and add a new column which follows the formula: Value = A(where Time=1) + A(where Time=3), I don&#8217;t want to use A (where Time=5). I know how to do by selecting the cell needed for the formula, but is there any other better ways to perform the calculation?…

Accepted Answer

Use Series.eq with DataFrame.groupby and Series.cumsum to create groups and add.c1 = df.Time.eq(1)c3 = df.Time.eq(3)df['Value'] = (df.loc[c1|c3]                 .groupby(c1.cumsum())                 .A                 .transform('sum')                 .loc[c1])print(df)or if you want to identify it based on the non-equivalence with 5:c = df['Time'].eq(5)df['value'] = (df['A'].mask(c)                     .groupby(c.cumsum())                     .transform('sum')                     .where(c.shift(fill_value = True))              ) #Another option is map c = df['Time'].eq(5) c_cumsum = c.cumsum() df['value'] = (c_cumsum.map(df['A'].mask(c)                       .groupby(c_cumsum)                       .sum())                       .where(c.shift(fill_value = True)))Output   Type subType  Time  A  Value0     X       a     1  3   12.01     X       a     3  9    NaN2     X       a     5  9    NaN3     X       b     1  4    9.04     X       b     3  5    NaN5     X       b     5  0    NaN6     Y       a     1  1    3.07     Y       a     3  2    NaN8     Y       a     5  3    NaN9     Y       b     1  4    9.010    Y       b     3  5    NaN11    Y       b     5  2    NaNMISSING VALUESc = df['Time'].eq(5)df['value'] = (df['A'].mask(c)                     .groupby(c.cumsum())                     .transform('sum')              )#or method 1#c1 = df.Time.eq(1)#c3 = df.Time.eq(3)#df['Value'] = (df.loc[c1|c3]#                 .groupby(c1.cumsum())#                 .A#                 .transform('sum')#               )print(df)Output   Type subType  Time  A  value0     X       a     1  3   12.01     X       a     3  9   12.02     X       a     5  9    9.03     X       b     1  4    9.04     X       b     3  5    9.05     X       b     5  0    3.06     Y       a     1  1    3.07     Y       a     3  2    3.08     Y       a     5  3    9.09     Y       b     1  4    9.010    Y       b     3  5    9.011    Y       b     5  2    0.0or filling all except where Time is 5c = df['Time'].eq(5)df['value'] = (df['A'].mask(c)                     .groupby(c.cumsum())                     .transform('sum').mask(c))#c1 = df.Time.eq(1)#c3 = df.Time.eq(3)#or method 1#df['Value'] = (df.loc[c1|c3]#                 .groupby(c1.cumsum())#                 .A#                 .transform('sum')#                 .loc[c1|c3])print(df)   Type subType  Time  A  value0     X       a     1  3   12.01     X       a     3  9   12.02     X       a     5  9    NaN3     X       b     1  4    9.04     X       b     3  5    9.05     X       b     5  0    NaN6     Y       a     1  1    3.07     Y       a     3  2    3.08     Y       a     5  3    NaN9     Y       b     1  4    9.010    Y       b     3  5    9.011    Y       b     5  2    NaNWhy not use apply here?Even in a small data frame it is already slower%%timeit(    df.groupby(by=['Type','subType'])    .apply(lambda x: x.loc[x.Time!=5].A.sum()) # sum time each group exclu    .to_frame('Value').reset_index()    .pipe(lambda x: pd.merge(df, x, on=['Type', 'subType'], how='left')))13.6 ms ± 2.67 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)%%timeitc = df['Time'].eq(5)df['value'] = (df['A'].mask(c)                     .groupby(c.cumsum())                     .transform('sum')                     .where(c.shift(fill_value = True))              )3.67 ms ± 118 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Advertisement

Answer