I have a Pandas dataframe something like:
Feature A | Feature B | Feature C |
---|---|---|
A1 | B1 | C1 |
A2 | B2 | C2 |
Given k as input, i want all values combination grouped by feature of length k, for example for k = 2 I want:
[{A:A1, B:B1}, {A:A1, B:B2}, {A:A1, C:C1}, {A:A1, C:C2}, {A:A2, B:B1}, {A:A2, B:B2}, {A:A2, C:C1}, {A:A2, C:C2}, {B:B1, C:C1}, {B:B1, C:C2}, {B:B2, C:C1}, {B:B2, C:C2}]
How can I achieve that?
Advertisement
Answer
This is probably not that efficient but it works for small scale.
First, determine the unique combinations of k
columns.
from itertools import combinations k = 2 cols = list(combinations(df.columns, k))
Then use MultiIndex.from_product
to get cartesian product of k
columns.
result = [] for c in cols: result += pd.MultiIndex.from_product([df[x] for x in c]).values.tolist()