Skip to content
Advertisement

how rank is calculated in pandas

I confuse to understand rank of series. I know that rank is calculated from the highest value to lowest value in a series. If two numbers are equal, then pandas calculates the average of the numbers.

In this example, the highest value is 7. why do we get rank 5.5 for number 7 and rank 1.5 for number 4 ?

S1 = pd.Series([7,6,7,5,4,4])
S1.rank()

Output:

0    5.5
1    4.0
2    5.5
3    3.0
4    1.5
5    1.5
dtype: float64

Advertisement

Answer

The Rank is calculated in this way

  1. Arrange the elements in ascending order and the ranks are assigned starting with ‘1’ for the lowest element.
Elements - 4, 4, 5, 6, 7, 7
Ranks    - 1, 2, 3, 4, 5, 6
  1. Now consider the repeating items, average out the corresponding ranks and assign the averaged rank to them.

Since we have ‘4’ repeating twice, the final rank of each occurrence will be the average of 1,2 which is 1.5. In the same way or 7, final rank for each occurrence will be average of 5,6 which is 5.5

Elements -   4,   4,   5, 6, 7,   7
Ranks    -   1,   2,   3, 4, 5,   6
Final Rank - 1.5, 1.5, 3, 4, 5.5, 5.5
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement