Skip to content
Advertisement

Sorting (or partially sorting) a list of objects based on a specific attribute

Problem

I have a list of objects. Each object has two attributes: “score” and “coordinates”. I need to find the largest N objects of the list based on the score attribute. The main problem I’m having is sorting the objects using only the score attribute. Sorting can be partial. I’m only interested in the N largest objects.

Current Solution

My current approach is not the most elegant nor the most efficient. The idea is to create a dictionary of object indices and their score, then sort the list of scores and use the dictionary to index the objects that yielded the largest scores.

These are the steps:

  1. Create a list of scores. Each element of the list corresponds to an object. That is, the first entry is the score of the first object, the second entry is the score of the second object, and so on.

  2. Create a dictionary using the object’s scores as key and object index as value.

  3. Order the score list using a heapq to get the N largest objects.

  4. Use the dictionary to get those objects that have the largest scores.

  5. Create a new list with only the N largest-score objects.

Code Snippets

This is my sorting function:

import random
import heapq


# Gets the N objects with the largest score:
def getLargest(N, objects):
    # Set output objects:
    outobjects = objects

    # Get the total of objects in list:
    totalobjects = len(objects)

    # Check if the total number of objects is bigger than the N requested
    # largest objects:

    if totalobjects > N:

        # Get the "score" attributes from all the objects:
        objectScores = [o.score for o in objects]

        # Create a dictionary with the index of the objects and their score.
        # I'm using a dictionary to keep track of the largest scores and
        # the objects that produced them:
        objectIndices = range(totalobjects)
        objectDictionary = dict(zip(objectIndices, objectScores))

        # Get the N largest objects based on score:
        largestObjects = heapq.nlargest(N, objectScores)
        print(largestObjects)

        # Prepare the output list of objects:
        outobjects = [None] * N

        # Look for those objects that produced the
        # largest score:
        for k in range(N):
            # Get current largest object:
            currentLargest = largestObjects[k]
            # Get its original position on the keypoint list:
            position = objectScores.index(currentLargest)
            # Index the corresponding keypoint and store it
            # in the output list:
            outobjects[k] = objects[position]

    # Done:
    return outobjects

This snippet generates 100 random objects used to test my method. The final loop prints the N = 3 randomly-generated objects with the largest score:

# Create a list with random objects:
totalObjects = 100
randomObjects = []


# Test object class:
class Object(object):
    pass


# Generate a list of random objects
for i in range(totalObjects):
    # Instance of objects:
    tempObject = Object()
    # Set the object's random score
    random.seed()
    tempObject.score = random.random()
    # Set the object's random coordinates:
    tempObject.coordinates = (random.randint(0, 5), random.randint(0, 5))
    # Store object into list:
    randomObjects.append(tempObject)

# Get the 3 largest objects sorted by score:
totalLargestObjects = 3
largestObjects = getLargest(totalLargestObjects, randomObjects)

# Print the filtered objects:
for i in range(len(largestObjects)):
    # Get the current object in the list:
    currentObject = largestObjects[i]
    # Get its score:
    currentScore = currentObject.score
    # Get its coordinates as a tuple (x,y)
    currentCoordinates = currentObject.coordinates
    # Print the info:
    print("object: " + str(i) + " score: " + str(currentScore) + " x: " + str(
        currentCoordinates[0]) + " y: " + str(currentCoordinates[1]))

My current method gets the job done, but there has to be a more pythonic (more vectorized) way of achieving the same thing. My background is mostly C++, I’m still learning Python. Any feedback is welcome.

Additional info

Initially, I was looking for something similar to C++’s std:: nth_element. It appears this functionality is somewhat provided in Python by NumPy’s partition. Unfortunately, while std::nth_element supports a predicate for custom ordering, NumPy’s partition does not. I ended up using a heapq, which does the job nicely and sorts in the desired order, but I don’t know the optimal way to sort based on one attribute.

Advertisement

Answer

Tuple is what you needed. Instead of storing scores in a heap, stores a tuple of (score, object) in the heap. It will try to compare by the score and return a list of tuples which you can use to retrieve the original objects. This will save you extra steps to retrieve objects by their scores:

heapq.nlargest(3, ((obj.score, obj) for obj in randomObjects))
# [(0.9996643881256989, <__main__.Object object at 0x155f730>), (0.9991398955041872, <__main__.Object object at 0x119e928>), (0.9858047551444177, <__main__.Object object at 0x15e38c0>)]

For a real world example: https://akuiper.com/console/g6YuNa_1WClp

Or as @shriakhilc commented, use key parameter in heapq.nlargest to specify you want to compare by score:

heapq.nlargest(3, randomObjects, lambda o: o.score)
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement