i have the following Dataframe (df = ) with around 40 mio rows.
No Color A Red A Black B White B Orange B Green C Red C Green D Yellow D Orange D Green
i try to have the following output:
No Color A Red A Black B White B Orange B White B Green B Orange B Green C Red C Green D Yellow D Orange D Yellow D Green D Orange D Green
at first i thought to use itertools combinations, it.combinations(Colors[“Colors”],2), but the problem was, that it gives me the combinations of the whole column and don’t correlate to the column “No”. The next try was to aggregate the whole dataframe to have all the needed combination in a list and only have about 5000 rows
from:
No Colors A Red, Black B White, Orange, Green C Red, Green, Yellow D Orange, Green
to:
No Colors CombColors A Red,Black Red, Black B White,Orange,Green White, Orange, White, Green, Orange, Green C Red,Green,Yellow Red, Green, Red, Yellow, Green, Yellow D Orange,Green Orange, Green
with: df.apply(lambda x: list(it.combinations(x,2), axis =1)
but this also doesn’t work (all combinations in each row).
What is the right solution to achieve the wanted output (of attempt 1 or attempt 2)?
Edit: 1
if i try to use df.apply(lambda x: list(it.combinations(x,2), axis =1)
i generate following column
No Colors CombColors A Red,Black [([Red, Black], [(['Red', 'Black'], ['White'.. B White,Orange,Green [([White, Orange, Green], [(['Red', 'Black']... C Red,Green,Yellow [([Red, Green], [(['Red', 'Black'], ['White'... D Orange,Green [([Yellow, Orange, Green], [(['Red', 'Black']...
i think on problem is, i aggregate the Colors by a tuple or list (tuple is empty []). df.groupby("No")["Color"].apply(list).agg(tuple).to_frame()
nevertheless the itertool gives me a combination of every column.
Edit 2: the solutions of alparslan mimaroğlu and Henry Vik work both and are (for me) astonishing. Till now i cannot understand the logic behind these, but i’ll try! Thanks!
Advertisement
Answer
You can groupby by No and create the lists you want quite easily.
def combinations(group): return pd.Series(list(it.combinations(group['Color'].unique(), 2))) df.groupby('No').apply(combinations).explode()
if you don’t make it explode it will return you a list of color combinations