Skip to content
Advertisement

Pandas count number of rows since value > current cell

I have a Dataframe df:

       col_1
    0  51.0  
    1  52.0  
    2  51.5  
    3  51.5  
    4  53.0  
    5  54.0  
    6  52.0  
    7  53.0  
    8  50.5  
    9  50.0  
   10  52.0  

I would like to count the number of rows backwards until a value greater than the current value is found, or until the first row is reached. For example, the result would be:

       col_1 count
    0  51.0  0
    1  52.0  1
    2  51.5  0
    3  51.5  1
    4  53.0  4
    5  54.0  5
    6  52.0  0
    7  53.0  1
    8  50.5  0
    9  50.0  0
   10  52.0  2

I am basically trying to find the value of n in df['col_1'].rolling(n).max() for each value of the series. How can I do this? Thank you in advance.

Advertisement

Answer

Numpy Approach

a = df['col_1'].values[::-1]
m = np.triu(a[:, None] <  a)

i = m.argmax(1)
i[~m.any(1)] = len(m)
df['count'] = (i - range(len(m)) - 1)[::-1]

How it works?

  • Reverse the column col_1 and obtain a numpy view
>>> a

array([52. , 50. , 50.5, 53. , 52. , 54. , 53. , 51.5, 51.5, 52. , 51. ])
  • Using broadcasting compare column a with itself
>>> a[:, None] <  a

[[False False False  True False  True  True False False False False]
 [ True False  True  True  True  True  True  True  True  True  True]
 [ True False False  True  True  True  True  True  True  True  True]
 [False False False False False  True False False False False False]
 [False False False  True False  True  True False False False False]
 [False False False False False False False False False False False]
 [False False False False False  True False False False False False]
 [ True False False  True  True  True  True False False  True False]
 [ True False False  True  True  True  True False False  True False]
 [False False False  True False  True  True False False False False]
 [ True False False  True  True  True  True  True  True  True False]]
  • Using triu (upper triangular) mask the values to the left of main diagonal as we are only concerned with the values that come after the current value. Think of this as the main diagonal representing the current value.
>>> np.triu(a[:, None] < a)

[[False False False  True False  True  True False False False False]
 [False False  True  True  True  True  True  True  True  True  True]
 [False False False  True  True  True  True  True  True  True  True]
 [False False False False False  True False False False False False]
 [False False False False False  True  True False False False False]
 [False False False False False False False False False False False]
 [False False False False False False False False False False False]
 [False False False False False False False False False  True False]
 [False False False False False False False False False  True False]
 [False False False False False False False False False False False]
 [False False False False False False False False False False False]]
  • Find the index of maximum value along the columns axis, this represents the index of first maximum after the current value. If there is no maximum value found we have update those indices to length of series
>>> i = m.argmax(1)
>>> i[~m.any(1)] = len(m)
>>> i

array([ 3,  2,  3,  5,  5, 11, 11,  9,  9, 11, 11])
  • The above indices are zero based but we have to calculate the distance from the main diagonal therefore we can subtract these indices from the indices of main diagonal to calculate the distance (number of rows)
>>> df['count'] = (i - range(len(m)) - 1)[::-1]
>>> df

    col_1  count
0    51.0      0
1    52.0      1
2    51.5      0
3    51.5      1
4    53.0      4
5    54.0      5
6    52.0      0
7    53.0      1
8    50.5      0
9    50.0      0
10   52.0      2
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement