Skip to content
Advertisement

Iterating Through List of Lists, Maintaining List Structure

Say I have the following list of list of names:

names = [['Matt', 'Matt', 'Paul'], ['Matt']]

I want to return only the “Matts” in the list, but I also want to maintain the list of list structure. So I want to return:

[['Matt', 'Matt'], ['Matt']]

I’ve something like this, but this will append everthting together in one big list:

matts = [name for namelist in names for name in namelist if name=="Matt"]

I know something like this is possible, but I want to avoid iterating through lists and appending. Is this possible?

names = [['Matt', 'Matt', 'Paul'], ['Matt']]
matts = []
for namelist in names:
    matts_namelist = []
    for name in namelist:
        if name=="Matt":
            matts_namelist.append(name)
        else:
            pass
    matts.append(matts_namelist)
        

Advertisement

Answer

Use a nested list comprehension, as below:

names = [['Matt', 'Matt', 'Paul'], ['Matt']]
res = [[name for name in lst if name == "Matt"] for lst in names]
print(res)

Output

[['Matt', 'Matt'], ['Matt']]

The above nested list comprehension is equivalent to the following for-loop:

res = []
for lst in names:
    res.append([name for name in lst if name == "Matt"])
print(res)

A third alternative functional alternative using filter and partial, is to do:

from operator import eq
from functools import partial

names = [['Matt', 'Matt', 'Paul'], ['Matt']]

eq_matt = partial(eq, "Matt")
res = [[*filter(eq_matt, lst)] for lst in names]
print(res)

Micro-Benchmark

%timeit [[*filter(eq_matt, lst)] for lst in names]
56.3 µs ± 519 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit [[name for name in lst if "Matt" == name] for lst in names]
26.9 µs ± 355 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Setup (of micro-benchmarks)

import random
population = ["Matt", "James", "William", "Charles", "Paul", "John"]
names = [random.choices(population, k=10) for _ in range(50)]

Full Benchmark

Candidates

def nested_list_comprehension(names, needle="Matt"):
    return [[name for name in lst if needle == name] for lst in names]


def functional_approach(names, needle="Matt"):
    eq_matt = partial(eq, needle)
    return [[*filter(eq_matt, lst)] for lst in names]


def count_approach(names, needle="Matt"):
    return [[needle] * name.count(needle) for name in names]

Plot Plot of alternative solutions

The above results were obtained for a list that varies from 100 to 1000 elements where each element is a list of 10 strings chosen at random from a population of 14 strings (names). The code for reproducing the results can be found here. As it can be seen from the plot the most performant solution is the one from @rv.kvetch.

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement