Skip to content
Advertisement

Merge lists in a dataframe column if they share a common value

What I need:

I have a dataframe where the elements of a column are lists. There are no duplications of elements in a list. For example, a dataframe like the following:

JavaScript

I would like to obtain a dataframe where, if at least a number contained in a list at row i is also contained in a list at row j, then the two list are merged (without duplication). But the values could also be shared by more than two lists, in that case I want all lists that share at least a value to be merged.

JavaScript

The order of the rows of the output dataframe, nor the values inside a list is important.


What I tried:

I have found this answer, that shows how to tell if at least one item in list is contained in another list, e.g.

JavaScript

Returns True, since 2 is contained in both lists.

I have also found this useful answer that shows how to compare each row of a dataframe with each other. The answer applies a custom function to each row of the dataframe using a lambda.

JavaScript

However I’m not sure how to put this two things together, how to create the func method. Also I don’t know if this approach is even feasible since the resulting rows will probably be less than the ones of the original dataframe.

Thanks!

Advertisement

Answer

This is not straightforward. Merging lists has many pitfalls.

One solid approach is to use a specialized library, for example networkx to use a graph approach. You can generate successive edges and find the connected components.

Here is your graph:

networkx graph list merging

You can thus:

  • generate successive edges with add_edges_from
  • find the connected_components
  • craft a dictionary and map the first item of each list
  • groupby and merge the lists (you could use the connected components directly but I’m giving a pandas solution in case you have more columns to handle)
JavaScript

output:

JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement