Skip to content
Advertisement

How to create multiple combinations of columns from a pandas dataframe?

I’ll illustrate my problem with a drawing:

enter image description here

I have a pandas dataframe with 13 columns of 6 different types. Then I randomly want to take one of each type and create a new table to perform subsequent analyses. So in the end I want to create (3 choose 1) * 1 * (2 choose 1) * (2 choose 1) * (4 choose 1) * 1 = 48 new dataframes out of one pandas dataframe.

The columns don’t have specific names, but it could be for example: A1, A2, A3, B1, C1, C2, D1, D2, E1, E2, E3, E4, F1

Has anyone an idea how to implement this problem in Python?

Advertisement

Answer

If you can separate column names to lists according to their types, then your problem becomes a question of finding the Cartesian product of these lists. Once you find the Cartesian product, you can iterate over it and filter your DataFrame with a combination of column names (there are (3 choose 1) * 1 * (2 choose 1) * (2 choose 1) * (4 choose 1) * 1 = 48 of them).

A_cols = ['A1','A2','A3']
B_cols = ['B1']
C_cols = ['C1','C2']
D_cols = ['D1','D2']
E_cols = ['E1','E2','E3','E4']
F_cols = ['F1']

# column_combos is length 48
column_combos = pd.MultiIndex.from_product([A_cols,B_cols,C_cols,D_cols,E_cols,F_cols])
# out is a dictionary of 48 DataFrames
out = {';'.join(cols): df[[*cols]] for cols in column_combos}
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement