Skip to content
Advertisement

Create forth column based on other columns (lagged) values

df = pd.DataFrame(np.random.randint(0,2,size=(5,3)), columns=list('ABC'))

print (df)

A  B  C
0  1  1  1
1  0  0  1
2  1  1  1
3  1  1  0
4  1  0  1

I would like to create a forth column “D” which will take a value of 1 if:

  • (at least) two column (A, B, C) have a value of 1 or
  • the previous 2 periods had at least two columns with a value of 1.

According to the example above all the rows would have df['D']==1

Advertisement

Answer

We can look for the 3-window rolling sum of a boolean series that marks at least 2 ones per row, and check if the result is 0 (so that D will be too) or not:

df["D"] = df.sum(axis=1).ge(2).rolling(3, min_periods=1).sum().ne(0).astype(int)

samples:

>>> df1

   A  B  C
0  0  0  1
1  0  1  0
2  1  1  1
3  0  1  0
4  1  1  1
5  1  0  0
6  1  0  0
7  0  0  1
8  1  0  1
9  1  0  0

>>> # after..

   A  B  C  D
0  0  0  1  0
1  0  1  0  0
2  1  1  1  1
3  0  1  0  1
4  1  1  1  1
5  1  0  0  1
6  1  0  0  1
7  0  0  1  0
8  1  0  1  1
9  1  0  0  1


>>> df2
   A  B  C
0  0  0  0
1  0  1  0
2  0  0  0
3  1  1  0
4  0  1  1
5  1  0  0
6  0  0  0
7  1  1  1
8  0  1  1
9  0  0  0

>>> # after...

   A  B  C  D
0  0  0  0  0
1  0  1  0  0
2  0  0  0  0
3  1  1  0  1
4  0  1  1  1
5  1  0  0  1
6  0  0  0  1
7  1  1  1  1
8  0  1  1  1
9  0  0  0  1

Advertisement