Skip to content
Advertisement

How to get a list of all indices of repeated elements in a numpy array

I’m trying to get the index of all repeated elements in a numpy array, but the solution I found for the moment is REALLY inefficient for a large (>20000 elements) input array (it takes more or less 9 seconds). The idea is simple:

  1. records_arrayis a numpy array of timestamps (datetime) from which we want to extract the indexes of repeated timestamps

  2. time_array is a numpy array containing all the timestamps that are repeated in records_array

  3. records is a django QuerySet (which can easily converted to a list) containing some Record objects. We want to create a list of couples formed by all possible combinations of tagId attributes of Record corresponding to the repeated timestamps found from records_array.

Here is the working (but inefficient) code I have for the moment:

JavaScript

I’m quite sure this can be optimized by using Numpy, but I can’t find a way to compare records_array with each element of time_array without using a for loop (this can’t be compared by just using ==, since they are both arrays).

Advertisement

Answer

A vectorized solution with numpy, on the magic of unique().

JavaScript

The following code was the original answer, which required a bit more memory, using numpy broadcasting and calling unique twice:

JavaScript

with as expected res = [array([0, 3, 4]), array([1, 8]), array([2, 5, 7])]

Advertisement