# How to calculate with conditions in pandas?

#### Tags: calculation, conditional-statements, jupyter-notebook, pandas, python

I have a dataframe like this, I want to calculate and add a new column which follows the formula: `Value = A(where Time=1) + A(where Time=3)`, I don’t want to use A (where Time=5).

```Type subType Time   A           Value
X    a       1      3         =3+9=12
X    a       3      9
X    a       5      9
X    b       1      4         =4+5=9
X    b       3      5
X    b       5      0
Y    a       1      1         =1+2=3
Y    a       3      2
Y    a       5      3
Y    b       1      4         =4+5=9
Y    b       3      5
Y    b       5      2
```

I know how to do by selecting the cell needed for the formula, but is there any other better ways to perform the calculation? I suspect I need to add a condition but not sure how, any suggestion?

Use `Series.eq` with `DataFrame.groupby` and `Series.cumsum` to create groups and add.

```c1 = df.Time.eq(1)
c3 = df.Time.eq(3)
df['Value'] = (df.loc[c1|c3]
.groupby(c1.cumsum())
.A
.transform('sum')
.loc[c1])
print(df)
```

or if you want to identify it based on the non-equivalence with 5:

```c = df['Time'].eq(5)
.groupby(c.cumsum())
.transform('sum')
.where(c.shift(fill_value = True))
)
#Another option is map
c = df['Time'].eq(5)
c_cumsum = c.cumsum()
.groupby(c_cumsum)
.sum())
.where(c.shift(fill_value = True)))
```

Output

```   Type subType  Time  A  Value
0     X       a     1  3   12.0
1     X       a     3  9    NaN
2     X       a     5  9    NaN
3     X       b     1  4    9.0
4     X       b     3  5    NaN
5     X       b     5  0    NaN
6     Y       a     1  1    3.0
7     Y       a     3  2    NaN
8     Y       a     5  3    NaN
9     Y       b     1  4    9.0
10    Y       b     3  5    NaN
11    Y       b     5  2    NaN
```

MISSING VALUES

```c = df['Time'].eq(5)
.groupby(c.cumsum())
.transform('sum')

)
#or method 1
#c1 = df.Time.eq(1)
#c3 = df.Time.eq(3)
#df['Value'] = (df.loc[c1|c3]
#                 .groupby(c1.cumsum())
#                 .A
#                 .transform('sum')
#               )
print(df)
```

Output

```   Type subType  Time  A  value
0     X       a     1  3   12.0
1     X       a     3  9   12.0
2     X       a     5  9    9.0
3     X       b     1  4    9.0
4     X       b     3  5    9.0
5     X       b     5  0    3.0
6     Y       a     1  1    3.0
7     Y       a     3  2    3.0
8     Y       a     5  3    9.0
9     Y       b     1  4    9.0
10    Y       b     3  5    9.0
11    Y       b     5  2    0.0
```

or filling all except where Time is 5

```c = df['Time'].eq(5)
.groupby(c.cumsum())

#c1 = df.Time.eq(1)
#c3 = df.Time.eq(3)
#or method 1
#df['Value'] = (df.loc[c1|c3]
#                 .groupby(c1.cumsum())
#                 .A
#                 .transform('sum')
#                 .loc[c1|c3])
print(df)
Type subType  Time  A  value
0     X       a     1  3   12.0
1     X       a     3  9   12.0
2     X       a     5  9    NaN
3     X       b     1  4    9.0
4     X       b     3  5    9.0
5     X       b     5  0    NaN
6     Y       a     1  1    3.0
7     Y       a     3  2    3.0
8     Y       a     5  3    NaN
9     Y       b     1  4    9.0
10    Y       b     3  5    9.0
11    Y       b     5  2    NaN
```

Why not use apply here?

Even in a small data frame it is already slower

```%%timeit

(
df.groupby(by=['Type','subType'])
.apply(lambda x: x.loc[x.Time!=5].A.sum()) # sum time each group exclu
.to_frame('Value').reset_index()
.pipe(lambda x: pd.merge(df, x, on=['Type', 'subType'], how='left'))
)
13.6 ms Â± 2.67 ms per loop (mean Â± std. dev. of 7 runs, 100 loops each)

%%timeit
c = df['Time'].eq(5)