Let’s consider this Dataframe:
$> df a b 0 6 50 1 2 20 2 9 60 3 4 40 4 5 20
I want to compute column D based on:
- The max value between:
- integer 0
- A slice of column B at that row’s index
So I have created a column C (all zeroes) in my dataframe in order use DataFrame.max(axis=1)
. However, short of using apply or looping over the DataFrame, I don’t know how to slice the input values. Expected result would be:
$> df a b c d 0 6 50 0 60 1 2 20 0 60 2 9 60 0 60 3 4 40 0 40 4 5 20 0 20
So essentially, d’s 3rd row is computed (pseudo-code) as max(df[3:,"b"], df[3:,"c"])
, and similarly for each row.
Since the input columns (b, c) have already been computed, there has to be a way to slice the input as I calculate each row for D without having to loop, as this is slow.
Advertisement
Answer
Seems like this could work: Reverse “B”, find cummax
, then reverse it back and assign it to “d”. Then use where
on “d” to see if any value is less than 0:
df['d'] = df['b'][::-1].cummax()[::-1] df['d'] = df['d'].where(df['d']>0, 0)
We can replace the last line with the one below using clip
(thanks @Either), and drop the 2nd reversal (assuming indexes match) making it all a one liner:
df['d'] = df['b'][::-1].cummax().clip(lower=0)
Output:
a b d 0 6 50 60 1 2 20 60 2 9 60 60 3 4 40 40 4 5 20 20