I have similar question to one that has been asked several years ago, the link is down here. the thing is that all answers are in python 2 and does work for me. my lists are huge so time is important. if anyone can solve that for python3, that will really help.
Consider this list of lists:
l = [ [1], [1], [1,2,3], [4,1], [5], [5], [6], [7,8,9], [7,6], [8,5] ]
I want to combine all the lists that have at least one number in common, that will be done iteratively until it finished and no dubletters will go with. The result will be:
combine(l) = [ [1,2,3,4], [5,6,7,8,9] ]
is there any neat way to do this, perhaps with itertools?
Combine lists that have at least a number in common in a list of lists, Python
Advertisement
Answer
This can be seen as a graph problem in which you merge subgraphs and need to find the connected components.
Here is your graph:
networkx
Using networkx
you can do:
import networkx as nx from itertools import chain, pairwise # python < 3.10 use this recipe for pairwise instead # from itertools import tee # def pairwise(iterable): # a, b = tee(iterable) # next(b, None) # return zip(a, b) G = nx.Graph() G = nx.from_edgelist(chain.from_iterable(pairwise(e) for e in l)) G.add_nodes_from(set.union(*map(set, l))) # adding single items list(nx.connected_components(G))
output:
[{1, 2, 3, 4}, {5, 6, 7, 8, 9}]
python
Now, you can use pure python to perform the same thing, finding the connected components and merging them.
An example code is nicely described in this post (archive.org link for long term).
In summary, the first step is building the list of neighbors, then a recursive function is used to join the neighbors of neighbors keeping track of the already seen ones.
from collections import defaultdict #merge function to merge all sublist having common elements. def merge_common(lists): neigh = defaultdict(set) visited = set() for each in lists: for item in each: neigh[item].update(each) def comp(node, neigh = neigh, visited = visited, vis = visited.add): nodes = set([node]) next_node = nodes.pop while nodes: node = next_node() vis(node) nodes |= neigh[node] - visited yield node for node in neigh: if node not in visited: yield sorted(comp(node))
example:
merge_common(l) # [[1, 2, 3, 4], [5, 6, 7, 8, 9]]