I need to create a Pandas DataFrame that contains two columns: Combination - contains tuples that describe a combination of products in the binary table (e.g., ("bread", "eggs")) Count - contains the number of times that this combination appeared in the binary table The binary table that I keep referring to looks like this. 1 just means that the product

How to count the number of times a combination appears in a binary table in Python?

I need to create a Pandas DataFrame that contains two columns:

Combination – contains tuples that describe a combination of products in the binary table (e.g., (“bread”, “eggs”))
Count – contains the number of times that this combination appeared in the binary table

The binary table that I keep referring to looks like this. 1 just means that the product is in that specific product (or in this case, it is present in the combination), and 0 if otherwise.

      bread  cheese  eggs  flour  jam
1000    0      0       1     0     0
1001    1      0       0     0     0
1002    1      0       1     1     0
1003    1      0       1     0     1
1004    0      0       1     0     0
  ...  ...    ...     ...   ...   ...
1495    1      0       1     1     0
1496    1      1       1     0     0
1497    0      0       0     0     1
1498    1      0       0     0     0
1499    1      0       1     0     0
500 rows × 5 columns

JavaScript
​x
 
      bread  cheese  eggs  flour  jam
1000    0      0       1     0     0
1001    1      0       0     0     0
1002    1      0       1     1     0
1003    1      0       1     0     1
1004    0      0       1     0     0
  ...  ...    ...     ...   ...   ...
1495    1      0       1     1     0
1496    1      1       1     0     0
1497    0      0       0     0     1
1498    1      0       0     0     0
1499    1      0       1     0     0
500 rows × 5 columns
​

I already figured out how to create the combinations column, I just don’t know how to create the count column using the data from the binary table. Here’s my code so far:

import pandas as pd
import itertools

combinations_list = []
products = ["bread","cheese","eggs","flour","jam"]

for p in range(2, len(products)+1):
    for subset in itertools.combinations(products, p):
        combinations_list.append(str(subset))

#code for count column here

report = pd.DataFrame(combinations_list,columns=['Combinations']) 
report

JavaScript
 
import pandas as pd
import itertools
​
combinations_list = []
products = ["bread","cheese","eggs","flour","jam"]
​
for p in range(2, len(products)+1):
    for subset in itertools.combinations(products, p):
        combinations_list.append(str(subset))
​
#code for count column here
​
report = pd.DataFrame(combinations_list,columns=['Combinations']) 
report
​

This is what the code looks like but I still need to add the count column.

>>
    Combinations
0   ('bread', 'cheese')
1   ('bread', 'eggs')
2   ('bread', 'flour')
3   ('bread', 'jam')
4   ('cheese', 'eggs')
5   ('cheese', 'flour')
6   ('cheese', 'jam')
7   ('eggs', 'flour')
8   ('eggs', 'jam')
9   ('flour', 'jam')
10  ('bread', 'cheese', 'eggs')
11  ('bread', 'cheese', 'flour')
12  ('bread', 'cheese', 'jam')
13  ('bread', 'eggs', 'flour')
14  ('bread', 'eggs', 'jam')
15  ('bread', 'flour', 'jam')
16  ('cheese', 'eggs', 'flour')
17  ('cheese', 'eggs', 'jam')
18  ('cheese', 'flour', 'jam')
19  ('eggs', 'flour', 'jam')
20  ('bread', 'cheese', 'eggs', 'flour')
21  ('bread', 'cheese', 'eggs', 'jam')
22  ('bread', 'cheese', 'flour', 'jam')
23  ('bread', 'eggs', 'flour', 'jam')
24  ('cheese', 'eggs', 'flour', 'jam')
25  ('bread', 'cheese', 'eggs', 'flour', 'jam')

JavaScript
 
>>
    Combinations
 ('bread', 'cheese')
 ('bread', 'eggs')
 ('bread', 'flour')
 ('bread', 'jam')
 ('cheese', 'eggs')
 ('cheese', 'flour')
 ('cheese', 'jam')
 ('eggs', 'flour')
 ('eggs', 'jam')
 ('flour', 'jam')
('bread', 'cheese', 'eggs')
('bread', 'cheese', 'flour')
('bread', 'cheese', 'jam')
('bread', 'eggs', 'flour')
('bread', 'eggs', 'jam')
('bread', 'flour', 'jam')
('cheese', 'eggs', 'flour')
('cheese', 'eggs', 'jam')
('cheese', 'flour', 'jam')
('eggs', 'flour', 'jam')
('bread', 'cheese', 'eggs', 'flour')
('bread', 'cheese', 'eggs', 'jam')
('bread', 'cheese', 'flour', 'jam')
('bread', 'eggs', 'flour', 'jam')
('cheese', 'eggs', 'flour', 'jam')
('bread', 'cheese', 'eggs', 'flour', 'jam')
​

Can anyone please help me? Thank you!

Answer

Here is one solution:

d={}
for x in range(2,len(df.columns)+1):
    for y in itertools.combinations(df.columns,x):
        d[y]=0

for x in range(2,len(df.columns)+1):
    for i in df.index:
        s=[k for k in df.columns if df.loc[i, k]==1]
        p=[j for j in itertools.combinations(s,x)]
        for w in p:
            d[w]+=1

res=pd.DataFrame({'comb':d.keys(), 'count':d.values()})

JavaScript
 
d={}
for x in range(2,len(df.columns)+1):
    for y in itertools.combinations(df.columns,x):
        d[y]=0
​
for x in range(2,len(df.columns)+1):
    for i in df.index:
        s=[k for k in df.columns if df.loc[i, k]==1]
        p=[j for j in itertools.combinations(s,x)]
        for w in p:
            d[w]+=1
​
res=pd.DataFrame({'comb':d.keys(), 'count':d.values()})
​

For the visible part of your df (the rows you provided in the question) this code returns:

>>>print(res)
                                 comb  count
0                     (bread, cheese)      1
1                       (bread, eggs)      5
2                      (bread, flour)      2
3                        (bread, jam)      1
4                      (cheese, eggs)      1
5                     (cheese, flour)      0
6                       (cheese, jam)      0
7                       (eggs, flour)      2
8                         (eggs, jam)      1
9                        (flour, jam)      0
10              (bread, cheese, eggs)      1
11             (bread, cheese, flour)      0
12               (bread, cheese, jam)      0
13               (bread, eggs, flour)      2
14                 (bread, eggs, jam)      1
15                (bread, flour, jam)      0
16              (cheese, eggs, flour)      0
17                (cheese, eggs, jam)      0
18               (cheese, flour, jam)      0
19                 (eggs, flour, jam)      0
20       (bread, cheese, eggs, flour)      0
21         (bread, cheese, eggs, jam)      0
22        (bread, cheese, flour, jam)      0
23          (bread, eggs, flour, jam)      0
24         (cheese, eggs, flour, jam)      0
25  (bread, cheese, eggs, flour, jam)      0

JavaScript
 
>>>print(res)
                                 comb  count
                   (bread, cheese)      1
                     (bread, eggs)      5
                    (bread, flour)      2
                      (bread, jam)      1
                    (cheese, eggs)      1
                   (cheese, flour)      0
                     (cheese, jam)      0
                     (eggs, flour)      2
                       (eggs, jam)      1
                      (flour, jam)      0
            (bread, cheese, eggs)      1
           (bread, cheese, flour)      0
             (bread, cheese, jam)      0
             (bread, eggs, flour)      2
               (bread, eggs, jam)      1
              (bread, flour, jam)      0
            (cheese, eggs, flour)      0
              (cheese, eggs, jam)      0
             (cheese, flour, jam)      0
               (eggs, flour, jam)      0
     (bread, cheese, eggs, flour)      0
       (bread, cheese, eggs, jam)      0
      (bread, cheese, flour, jam)      0
        (bread, eggs, flour, jam)      0
       (cheese, eggs, flour, jam)      0
(bread, cheese, eggs, flour, jam)      0
​

Advertisement

Answer