Skip to content
Advertisement

Sorting (or partially sorting) a list of objects based on a specific attribute

Problem

I have a list of objects. Each object has two attributes: “score” and “coordinates”. I need to find the largest N objects of the list based on the score attribute. The main problem I’m having is sorting the objects using only the score attribute. Sorting can be partial. I’m only interested in the N largest objects.

Current Solution

My current approach is not the most elegant nor the most efficient. The idea is to create a dictionary of object indices and their score, then sort the list of scores and use the dictionary to index the objects that yielded the largest scores.

These are the steps:

  1. Create a list of scores. Each element of the list corresponds to an object. That is, the first entry is the score of the first object, the second entry is the score of the second object, and so on.

  2. Create a dictionary using the object’s scores as key and object index as value.

  3. Order the score list using a heapq to get the N largest objects.

  4. Use the dictionary to get those objects that have the largest scores.

  5. Create a new list with only the N largest-score objects.

Code Snippets

This is my sorting function:

JavaScript

This snippet generates 100 random objects used to test my method. The final loop prints the N = 3 randomly-generated objects with the largest score:

JavaScript

My current method gets the job done, but there has to be a more pythonic (more vectorized) way of achieving the same thing. My background is mostly C++, I’m still learning Python. Any feedback is welcome.

Additional info

Initially, I was looking for something similar to C++’s std:: nth_element. It appears this functionality is somewhat provided in Python by NumPy’s partition. Unfortunately, while std::nth_element supports a predicate for custom ordering, NumPy’s partition does not. I ended up using a heapq, which does the job nicely and sorts in the desired order, but I don’t know the optimal way to sort based on one attribute.

Advertisement

Answer

Tuple is what you needed. Instead of storing scores in a heap, stores a tuple of (score, object) in the heap. It will try to compare by the score and return a list of tuples which you can use to retrieve the original objects. This will save you extra steps to retrieve objects by their scores:

JavaScript

For a real world example: https://akuiper.com/console/g6YuNa_1WClp

Or as @shriakhilc commented, use key parameter in heapq.nlargest to specify you want to compare by score:

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement