I have a Dataframe df
:
col_1 0 51.0 1 52.0 2 51.5 3 51.5 4 53.0 5 54.0 6 52.0 7 53.0 8 50.5 9 50.0 10 52.0
I would like to count the number of rows backwards until a value greater than the current value is found, or until the first row is reached. For example, the result would be:
col_1 count 0 51.0 0 1 52.0 1 2 51.5 0 3 51.5 1 4 53.0 4 5 54.0 5 6 52.0 0 7 53.0 1 8 50.5 0 9 50.0 0 10 52.0 2
I am basically trying to find the value of n
in df['col_1'].rolling(n).max()
for each value of the series. How can I do this? Thank you in advance.
Advertisement
Answer
Numpy Approach
a = df['col_1'].values[::-1] m = np.triu(a[:, None] < a) i = m.argmax(1) i[~m.any(1)] = len(m) df['count'] = (i - range(len(m)) - 1)[::-1]
How it works?
- Reverse the column
col_1
and obtain a numpy view
>>> a array([52. , 50. , 50.5, 53. , 52. , 54. , 53. , 51.5, 51.5, 52. , 51. ])
- Using broadcasting compare column
a
with itself
>>> a[:, None] < a [[False False False True False True True False False False False] [ True False True True True True True True True True True] [ True False False True True True True True True True True] [False False False False False True False False False False False] [False False False True False True True False False False False] [False False False False False False False False False False False] [False False False False False True False False False False False] [ True False False True True True True False False True False] [ True False False True True True True False False True False] [False False False True False True True False False False False] [ True False False True True True True True True True False]]
- Using
triu
(upper triangular) mask the values to the left of main diagonal as we are only concerned with the values that come after the current value. Think of this as the main diagonal representing the current value.
>>> np.triu(a[:, None] < a) [[False False False True False True True False False False False] [False False True True True True True True True True True] [False False False True True True True True True True True] [False False False False False True False False False False False] [False False False False False True True False False False False] [False False False False False False False False False False False] [False False False False False False False False False False False] [False False False False False False False False False True False] [False False False False False False False False False True False] [False False False False False False False False False False False] [False False False False False False False False False False False]]
- Find the index of maximum value along the columns axis, this represents the index of first maximum after the current value. If there is no maximum value found we have update those indices to length of series
>>> i = m.argmax(1) >>> i[~m.any(1)] = len(m) >>> i array([ 3, 2, 3, 5, 5, 11, 11, 9, 9, 11, 11])
- The above indices are zero based but we have to calculate the distance from the main diagonal therefore we can subtract these indices from the indices of main diagonal to calculate the distance (number of rows)
>>> df['count'] = (i - range(len(m)) - 1)[::-1] >>> df col_1 count 0 51.0 0 1 52.0 1 2 51.5 0 3 51.5 1 4 53.0 4 5 54.0 5 6 52.0 0 7 53.0 1 8 50.5 0 9 50.0 0 10 52.0 2