Skip to content
Advertisement

Getting max values based on sliced column

Let’s consider this Dataframe:

$> df
   a    b    
0  6   50   
1  2   20    
2  9   60   
3  4   40   
4  5   20    

I want to compute column D based on:

  • The max value between:
    • integer 0
    • A slice of column B at that row’s index

So I have created a column C (all zeroes) in my dataframe in order use DataFrame.max(axis=1). However, short of using apply or looping over the DataFrame, I don’t know how to slice the input values. Expected result would be:

$> df
   a    b  c    d
0  6   50  0   60
1  2   20  0   60
2  9   60  0   60
3  4   40  0   40
4  5   20  0   20

So essentially, d’s 3rd row is computed (pseudo-code) as max(df[3:,"b"], df[3:,"c"]), and similarly for each row.

Since the input columns (b, c) have already been computed, there has to be a way to slice the input as I calculate each row for D without having to loop, as this is slow.

Advertisement

Answer

Seems like this could work: Reverse “B”, find cummax, then reverse it back and assign it to “d”. Then use where on “d” to see if any value is less than 0:

df['d'] = df['b'][::-1].cummax()[::-1]
df['d'] = df['d'].where(df['d']>0, 0)

We can replace the last line with the one below using clip (thanks @Either), and drop the 2nd reversal (assuming indexes match) making it all a one liner:

df['d'] = df['b'][::-1].cummax().clip(lower=0)

Output:

   a   b   d
0  6  50  60
1  2  20  60
2  9  60  60
3  4  40  40
4  5  20  20
Advertisement