Skip to content
Advertisement

How to check if all possible combinations of columns exist in dataframe (Pandas)?

I have the following dataframe

    A   B   ... 
0   1   1 
1   1   2
2   1   3
0   2   1 
1   2   2
2   2   3

And I would like to check if the dataframe is a complete combination of the entries in each column. In the above dataframe this is the case. A = {1,2} B = {1,2,3} and the dataframe contains all possible combinations. Following example would result in a false.

    A   B 
0   1   1 
1   1   2
0   2   1 

The number of columns should be flexible.

Many thanks for your help!

Advertisement

Answer

df = pd.DataFrame({'A': [1,1,1,2,2,2],
                   'B': [1,2,3,1,2,3]})

Create a data frame with all combinations of unique values in all columns

uniques = [df[i].unique().tolist() for i in df.columns]
df_combo = pd.DataFrame(product(*uniques), columns = df.columns)
print(df_combo)

   A  B
0  1  1
1  1  2
2  1  3
3  2  1
4  2  2
5  2  3

Test if two dataframes contain the same elements

df.equals(df_combo)
True

For False scenario,

df = pd.DataFrame({'A': [1,1,2],
                   'B': [1,2,1]})

df_combo
   A  B
0  1  1
1  1  2
2  2  1
3  2  2

df.equals(df_combo)
False
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement