Skip to content
Advertisement

All combinations of a row in a dataframe

i have the following Dataframe (df = ) with around 40 mio rows.

No  Color
A   Red
A   Black
B   White
B   Orange
B   Green
C   Red
C   Green
D   Yellow
D   Orange
D   Green

i try to have the following output:

No  Color
A   Red
A   Black
B   White
B   Orange
B   White
B   Green
B   Orange 
B   Green
C   Red
C   Green
D   Yellow
D   Orange
D   Yellow
D   Green
D   Orange
D   Green   

at first i thought to use itertools combinations, it.combinations(Colors[“Colors”],2), but the problem was, that it gives me the combinations of the whole column and don’t correlate to the column “No”. The next try was to aggregate the whole dataframe to have all the needed combination in a list and only have about 5000 rows

from:

No Colors
A   Red, Black  
B   White, Orange, Green
C   Red, Green, Yellow
D   Orange, Green   

to:

No  Colors              CombColors
A   Red,Black           Red, Black
B   White,Orange,Green  White, Orange, White, Green, Orange, Green
C   Red,Green,Yellow    Red, Green, Red, Yellow, Green, Yellow
D   Orange,Green        Orange, Green
   

with: df.apply(lambda x: list(it.combinations(x,2), axis =1) but this also doesn’t work (all combinations in each row).

What is the right solution to achieve the wanted output (of attempt 1 or attempt 2)?

Edit: 1

if i try to use df.apply(lambda x: list(it.combinations(x,2), axis =1) i generate following column

 No     Colors              CombColors
    A   Red,Black           [([Red, Black], [(['Red', 'Black'], ['White'..
    B   White,Orange,Green  [([White, Orange, Green], [(['Red', 'Black']...
    C   Red,Green,Yellow    [([Red, Green], [(['Red', 'Black'], ['White'...
    D   Orange,Green        [([Yellow, Orange, Green], [(['Red', 'Black']...

i think on problem is, i aggregate the Colors by a tuple or list (tuple is empty []). df.groupby("No")["Color"].apply(list).agg(tuple).to_frame()

nevertheless the itertool gives me a combination of every column.

Edit 2: the solutions of alparslan mimaroğlu and Henry Vik work both and are (for me) astonishing. Till now i cannot understand the logic behind these, but i’ll try! Thanks!

Advertisement

Answer

You can groupby by No and create the lists you want quite easily.

def combinations(group): 
    return pd.Series(list(it.combinations(group['Color'].unique(), 2)))

df.groupby('No').apply(combinations).explode()

if you don’t make it explode it will return you a list of color combinations

Advertisement