Skip to content
Advertisement

python3 join lists that have same value in list of lists

I have similar question to one that has been asked several years ago, the link is down here. the thing is that all answers are in python 2 and does work for me. my lists are huge so time is important. if anyone can solve that for python3, that will really help.

Consider this list of lists:

l = [ [1], [1], [1,2,3], [4,1], [5], [5], [6], [7,8,9], [7,6], [8,5] ]

I want to combine all the lists that have at least one number in common, that will be done iteratively until it finished and no dubletters will go with. The result will be:

combine(l) = [ [1,2,3,4], [5,6,7,8,9] ]

is there any neat way to do this, perhaps with itertools?

Combine lists that have at least a number in common in a list of lists, Python

Advertisement

Answer

This can be seen as a graph problem in which you merge subgraphs and need to find the connected components.

Here is your graph:

graph

networkx

Using networkx you can do:

import networkx as nx
from itertools import chain, pairwise
# python < 3.10 use this recipe for pairwise instead
# from itertools import tee
# def pairwise(iterable):
#     a, b = tee(iterable)
#     next(b, None)
#     return zip(a, b)

G = nx.Graph()
G = nx.from_edgelist(chain.from_iterable(pairwise(e) for e in l))
G.add_nodes_from(set.union(*map(set, l))) # adding single items

list(nx.connected_components(G))

output:

[{1, 2, 3, 4}, {5, 6, 7, 8, 9}]

python

Now, you can use pure python to perform the same thing, finding the connected components and merging them.

An example code is nicely described in this post (archive.org link for long term).

In summary, the first step is building the list of neighbors, then a recursive function is used to join the neighbors of neighbors keeping track of the already seen ones.

from collections import defaultdict 
  
#merge function to  merge all sublist having common elements. 
def merge_common(lists): 
    neigh = defaultdict(set) 
    visited = set() 
    for each in lists: 
        for item in each: 
            neigh[item].update(each) 
    def comp(node, neigh = neigh, visited = visited, vis = visited.add): 
        nodes = set([node]) 
        next_node = nodes.pop 
        while nodes: 
            node = next_node() 
            vis(node) 
            nodes |= neigh[node] - visited 
            yield node 
    for node in neigh: 
        if node not in visited: 
            yield sorted(comp(node))

example:

merge_common(l)
# [[1, 2, 3, 4], [5, 6, 7, 8, 9]]
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement