I need to create a Pandas DataFrame that contains two columns:
- Combination – contains tuples that describe a combination of products in the binary table (e.g., (“bread”, “eggs”))
- Count – contains the number of times that this combination appeared in the binary table
The binary table that I keep referring to looks like this. 1 just means that the product is in that specific product (or in this case, it is present in the combination), and 0 if otherwise.
bread cheese eggs flour jam 1000 0 0 1 0 0 1001 1 0 0 0 0 1002 1 0 1 1 0 1003 1 0 1 0 1 1004 0 0 1 0 0 ... ... ... ... ... ... 1495 1 0 1 1 0 1496 1 1 1 0 0 1497 0 0 0 0 1 1498 1 0 0 0 0 1499 1 0 1 0 0 500 rows × 5 columns
I already figured out how to create the combinations column, I just don’t know how to create the count column using the data from the binary table. Here’s my code so far:
import pandas as pd
import itertools
combinations_list = []
products = ["bread","cheese","eggs","flour","jam"]
for p in range(2, len(products)+1):
for subset in itertools.combinations(products, p):
combinations_list.append(str(subset))
#code for count column here
report = pd.DataFrame(combinations_list,columns=['Combinations'])
report
This is what the code looks like but I still need to add the count column.
>>
Combinations
0 ('bread', 'cheese')
1 ('bread', 'eggs')
2 ('bread', 'flour')
3 ('bread', 'jam')
4 ('cheese', 'eggs')
5 ('cheese', 'flour')
6 ('cheese', 'jam')
7 ('eggs', 'flour')
8 ('eggs', 'jam')
9 ('flour', 'jam')
10 ('bread', 'cheese', 'eggs')
11 ('bread', 'cheese', 'flour')
12 ('bread', 'cheese', 'jam')
13 ('bread', 'eggs', 'flour')
14 ('bread', 'eggs', 'jam')
15 ('bread', 'flour', 'jam')
16 ('cheese', 'eggs', 'flour')
17 ('cheese', 'eggs', 'jam')
18 ('cheese', 'flour', 'jam')
19 ('eggs', 'flour', 'jam')
20 ('bread', 'cheese', 'eggs', 'flour')
21 ('bread', 'cheese', 'eggs', 'jam')
22 ('bread', 'cheese', 'flour', 'jam')
23 ('bread', 'eggs', 'flour', 'jam')
24 ('cheese', 'eggs', 'flour', 'jam')
25 ('bread', 'cheese', 'eggs', 'flour', 'jam')
Can anyone please help me? Thank you!
Advertisement
Answer
Here is one solution:
d={}
for x in range(2,len(df.columns)+1):
for y in itertools.combinations(df.columns,x):
d[y]=0
for x in range(2,len(df.columns)+1):
for i in df.index:
s=[k for k in df.columns if df.loc[i, k]==1]
p=[j for j in itertools.combinations(s,x)]
for w in p:
d[w]+=1
res=pd.DataFrame({'comb':d.keys(), 'count':d.values()})
For the visible part of your df (the rows you provided in the question) this code returns:
>>>print(res)
comb count
0 (bread, cheese) 1
1 (bread, eggs) 5
2 (bread, flour) 2
3 (bread, jam) 1
4 (cheese, eggs) 1
5 (cheese, flour) 0
6 (cheese, jam) 0
7 (eggs, flour) 2
8 (eggs, jam) 1
9 (flour, jam) 0
10 (bread, cheese, eggs) 1
11 (bread, cheese, flour) 0
12 (bread, cheese, jam) 0
13 (bread, eggs, flour) 2
14 (bread, eggs, jam) 1
15 (bread, flour, jam) 0
16 (cheese, eggs, flour) 0
17 (cheese, eggs, jam) 0
18 (cheese, flour, jam) 0
19 (eggs, flour, jam) 0
20 (bread, cheese, eggs, flour) 0
21 (bread, cheese, eggs, jam) 0
22 (bread, cheese, flour, jam) 0
23 (bread, eggs, flour, jam) 0
24 (cheese, eggs, flour, jam) 0
25 (bread, cheese, eggs, flour, jam) 0