df = pd.DataFrame(np.random.randint(0,2,size=(5,3)), columns=list('ABC')) print (df)
I would like to create a forth column “D” which will take a value of 1 if:
- (at least) two column (A, B, C) have a value of 1 or
- the previous 2 periods had at least two columns with a value of 1.
According to the example above all the rows would have df['D']==1
Advertisement
Answer
We can look for the 3-window rolling sum of a boolean series that marks at least 2 ones per row, and check if the result is 0 (so that D
will be too) or not:
df["D"] = df.sum(axis=1).ge(2).rolling(3, min_periods=1).sum().ne(0).astype(int)
samples:
>>> df1 A B C 0 0 0 1 1 0 1 0 2 1 1 1 3 0 1 0 4 1 1 1 5 1 0 0 6 1 0 0 7 0 0 1 8 1 0 1 9 1 0 0 >>> # after.. A B C D 0 0 0 1 0 1 0 1 0 0 2 1 1 1 1 3 0 1 0 1 4 1 1 1 1 5 1 0 0 1 6 1 0 0 1 7 0 0 1 0 8 1 0 1 1 9 1 0 0 1 >>> df2 A B C 0 0 0 0 1 0 1 0 2 0 0 0 3 1 1 0 4 0 1 1 5 1 0 0 6 0 0 0 7 1 1 1 8 0 1 1 9 0 0 0 >>> # after... A B C D 0 0 0 0 0 1 0 1 0 0 2 0 0 0 0 3 1 1 0 1 4 0 1 1 1 5 1 0 0 1 6 0 0 0 1 7 1 1 1 1 8 0 1 1 1 9 0 0 0 1