Skip to content
Advertisement

How to count the number of times a combination appears in a binary table in Python?

I need to create a Pandas DataFrame that contains two columns:

  1. Combination – contains tuples that describe a combination of products in the binary table (e.g., (“bread”, “eggs”))
  2. Count – contains the number of times that this combination appeared in the binary table

The binary table that I keep referring to looks like this. 1 just means that the product is in that specific product (or in this case, it is present in the combination), and 0 if otherwise.

      bread  cheese  eggs  flour  jam
1000    0      0       1     0     0
1001    1      0       0     0     0
1002    1      0       1     1     0
1003    1      0       1     0     1
1004    0      0       1     0     0
  ...  ...    ...     ...   ...   ...
1495    1      0       1     1     0
1496    1      1       1     0     0
1497    0      0       0     0     1
1498    1      0       0     0     0
1499    1      0       1     0     0
500 rows × 5 columns

I already figured out how to create the combinations column, I just don’t know how to create the count column using the data from the binary table. Here’s my code so far:

import pandas as pd
import itertools

combinations_list = []
products = ["bread","cheese","eggs","flour","jam"]

for p in range(2, len(products)+1):
    for subset in itertools.combinations(products, p):
        combinations_list.append(str(subset))

#code for count column here

report = pd.DataFrame(combinations_list,columns=['Combinations']) 
report

This is what the code looks like but I still need to add the count column.

>>
    Combinations
0   ('bread', 'cheese')
1   ('bread', 'eggs')
2   ('bread', 'flour')
3   ('bread', 'jam')
4   ('cheese', 'eggs')
5   ('cheese', 'flour')
6   ('cheese', 'jam')
7   ('eggs', 'flour')
8   ('eggs', 'jam')
9   ('flour', 'jam')
10  ('bread', 'cheese', 'eggs')
11  ('bread', 'cheese', 'flour')
12  ('bread', 'cheese', 'jam')
13  ('bread', 'eggs', 'flour')
14  ('bread', 'eggs', 'jam')
15  ('bread', 'flour', 'jam')
16  ('cheese', 'eggs', 'flour')
17  ('cheese', 'eggs', 'jam')
18  ('cheese', 'flour', 'jam')
19  ('eggs', 'flour', 'jam')
20  ('bread', 'cheese', 'eggs', 'flour')
21  ('bread', 'cheese', 'eggs', 'jam')
22  ('bread', 'cheese', 'flour', 'jam')
23  ('bread', 'eggs', 'flour', 'jam')
24  ('cheese', 'eggs', 'flour', 'jam')
25  ('bread', 'cheese', 'eggs', 'flour', 'jam')

Can anyone please help me? Thank you!

Advertisement

Answer

Here is one solution:

d={}
for x in range(2,len(df.columns)+1):
    for y in itertools.combinations(df.columns,x):
        d[y]=0

for x in range(2,len(df.columns)+1):
    for i in df.index:
        s=[k for k in df.columns if df.loc[i, k]==1]
        p=[j for j in itertools.combinations(s,x)]
        for w in p:
            d[w]+=1

res=pd.DataFrame({'comb':d.keys(), 'count':d.values()})

For the visible part of your df (the rows you provided in the question) this code returns:

>>>print(res)
                                 comb  count
0                     (bread, cheese)      1
1                       (bread, eggs)      5
2                      (bread, flour)      2
3                        (bread, jam)      1
4                      (cheese, eggs)      1
5                     (cheese, flour)      0
6                       (cheese, jam)      0
7                       (eggs, flour)      2
8                         (eggs, jam)      1
9                        (flour, jam)      0
10              (bread, cheese, eggs)      1
11             (bread, cheese, flour)      0
12               (bread, cheese, jam)      0
13               (bread, eggs, flour)      2
14                 (bread, eggs, jam)      1
15                (bread, flour, jam)      0
16              (cheese, eggs, flour)      0
17                (cheese, eggs, jam)      0
18               (cheese, flour, jam)      0
19                 (eggs, flour, jam)      0
20       (bread, cheese, eggs, flour)      0
21         (bread, cheese, eggs, jam)      0
22        (bread, cheese, flour, jam)      0
23          (bread, eggs, flour, jam)      0
24         (cheese, eggs, flour, jam)      0
25  (bread, cheese, eggs, flour, jam)      0
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement