Skip to content
Advertisement

How to efficiently filter a large python list?

I have a relatively large array called allListings and want to filter out all rows where allListings[:][14] == listingID.

This is the code I am using: tempRows = list(filter(lambda x: x[14] == listingID, allListings))

The filtering is repeated in a for loop for all different listingID

Profiling shows, that this line consumes 95% of the runtime in the loop. Is there any other way to filter large arrays more efficiently?

Advertisement

Answer

As suggested in comments, you may want to sort and group by this column if you are performing multiple operations on it based on the value of that column.

>>> from itertools import groupby
>>> a = [[1, 2, 3, 5],
...      [4, 6, 2, 8],
...      [1, 5, 7, 9],
...      [3, 5, 8, 2]]
>>> b = sorted(a, key=lambda x: x[0])
>>> b
[[1, 2, 3, 5], [1, 5, 7, 9], [3, 5, 8, 2], [4, 6, 2, 8]]
>>> c = groupby(b, key=lambda x: x[0])
>>> c
<itertools.groupby object at 0x106b763e0>
>>> d = {k: list(v) for k, v in c}
>>> d
{1: [[1, 2, 3, 5], [1, 5, 7, 9]], 3: [[3, 5, 8, 2]], 4: [[4, 6, 2, 8]]}

Now, if you need all lists where the first element is 1, you simply need:

>>> d[1]
[[1, 2, 3, 5], [1, 5, 7, 9]]

Or if you wanted everything but 1 in that first position.

>>> [x for k, v in d.items() 
...    if k != 1 
...    for x in v] 
[[3, 5, 8, 2], [4, 6, 2, 8]]

This is obviously a simpler example, but should be easily applicable to your situation.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement