Merge lists in a dataframe column if they share a common value

Question

What I need: I have a dataframe where the elements of a column are lists. There are no duplications of elements in a list. For example, a dataframe like the following: I would like to obtain a dataframe where, if at least a number contained in a list at row i is also contained in a list at row j,

Accepted Answer

This is not straightforward. Merging lists has many pitfalls.One solid approach is to use a specialized library, for example networkx to use a graph approach. You can generate successive edges and find the connected components.Here is your graph:You can thus:generate successive edges with add_edges_fromfind the connected_componentscraft a dictionary and map the first item of each listgroupby and merge the lists (you could use the connected components directly but I&#8217;m giving a pandas solution in case you have more columns to handle)import networkx as nxG = nx.Graph()for l in df['col1']:    G.add_edges_from(zip(l, l[1:]))groups = {k:v for v,l in enumerate(nx.connected_components(G)) for k in l}# {1: 0, 2: 0, 4: 0, 8: 0, 10: 0, 19: 0, 16: 1, 17: 1, 15: 1, 18: 2, 3: 2}out = (df.groupby(df['col1'].str[0].map(groups), as_index=False)         .agg(lambda x: sorted(set().union(*x)))       )output:                   col10  [1, 2, 4, 8, 10, 19]1          [15, 16, 17]2               [3, 18]

Advertisement

Answer