Skip to content
Advertisement

Most efficient way to find shared members of a list inside a dataframe?

Hello experts: I’m looking at so-called ‘COVID-19 bubbles’ inside pro cycling – I’ve compiled a list of riders for each team and a list of each race they’ve done. There are about 30 riders per team, and there have been a few dozen races after the sport started up again in July.

I’m stumped right now on how to proceed with analyzing the data or if this structure is even the right approach.

My end goal is to have a sort of Venn diagram of which riders raced together the most, one for each team, to visualize if they stuck to these bubbles (eg eight riders doing the same six races, and a different group of eight riders doing a different list of races, etc.)

Feel free to tag if duplicate/inappropriate etc. But a hand up would be appreciated!

My dataframe looks like such for one team:

JavaScript

Advertisement

Answer

Consider a pandas solutions by migrating your dictionary with concat and then run a self-join (to use SQL speak) on itself avoiding reverse duplicates for final count of rider pairs with groupby:

Data

JavaScript

Self join

JavaScript

Aggregation

JavaScript
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement