I need to create a Pandas DataFrame that contains two columns:
- Combination – contains tuples that describe a combination of products in the binary table (e.g., (“bread”, “eggs”))
- Count – contains the number of times that this combination appeared in the binary table
The binary table that I keep referring to looks like this. 1 just means that the product is in that specific product (or in this case, it is present in the combination), and 0 if otherwise.
JavaScript
x
14
14
1
bread cheese eggs flour jam
2
1000 0 0 1 0 0
3
1001 1 0 0 0 0
4
1002 1 0 1 1 0
5
1003 1 0 1 0 1
6
1004 0 0 1 0 0
7
8
1495 1 0 1 1 0
9
1496 1 1 1 0 0
10
1497 0 0 0 0 1
11
1498 1 0 0 0 0
12
1499 1 0 1 0 0
13
500 rows × 5 columns
14
I already figured out how to create the combinations column, I just don’t know how to create the count column using the data from the binary table. Here’s my code so far:
JavaScript
1
15
15
1
import pandas as pd
2
import itertools
3
4
combinations_list = []
5
products = ["bread","cheese","eggs","flour","jam"]
6
7
for p in range(2, len(products)+1):
8
for subset in itertools.combinations(products, p):
9
combinations_list.append(str(subset))
10
11
#code for count column here
12
13
report = pd.DataFrame(combinations_list,columns=['Combinations'])
14
report
15
This is what the code looks like but I still need to add the count column.
JavaScript
1
29
29
1
>>
2
Combinations
3
0 ('bread', 'cheese')
4
1 ('bread', 'eggs')
5
2 ('bread', 'flour')
6
3 ('bread', 'jam')
7
4 ('cheese', 'eggs')
8
5 ('cheese', 'flour')
9
6 ('cheese', 'jam')
10
7 ('eggs', 'flour')
11
8 ('eggs', 'jam')
12
9 ('flour', 'jam')
13
10 ('bread', 'cheese', 'eggs')
14
11 ('bread', 'cheese', 'flour')
15
12 ('bread', 'cheese', 'jam')
16
13 ('bread', 'eggs', 'flour')
17
14 ('bread', 'eggs', 'jam')
18
15 ('bread', 'flour', 'jam')
19
16 ('cheese', 'eggs', 'flour')
20
17 ('cheese', 'eggs', 'jam')
21
18 ('cheese', 'flour', 'jam')
22
19 ('eggs', 'flour', 'jam')
23
20 ('bread', 'cheese', 'eggs', 'flour')
24
21 ('bread', 'cheese', 'eggs', 'jam')
25
22 ('bread', 'cheese', 'flour', 'jam')
26
23 ('bread', 'eggs', 'flour', 'jam')
27
24 ('cheese', 'eggs', 'flour', 'jam')
28
25 ('bread', 'cheese', 'eggs', 'flour', 'jam')
29
Can anyone please help me? Thank you!
Advertisement
Answer
Here is one solution:
JavaScript
1
14
14
1
d={}
2
for x in range(2,len(df.columns)+1):
3
for y in itertools.combinations(df.columns,x):
4
d[y]=0
5
6
for x in range(2,len(df.columns)+1):
7
for i in df.index:
8
s=[k for k in df.columns if df.loc[i, k]==1]
9
p=[j for j in itertools.combinations(s,x)]
10
for w in p:
11
d[w]+=1
12
13
res=pd.DataFrame({'comb':d.keys(), 'count':d.values()})
14
For the visible part of your df (the rows you provided in the question) this code returns:
JavaScript
1
29
29
1
>>>print(res)
2
comb count
3
0 (bread, cheese) 1
4
1 (bread, eggs) 5
5
2 (bread, flour) 2
6
3 (bread, jam) 1
7
4 (cheese, eggs) 1
8
5 (cheese, flour) 0
9
6 (cheese, jam) 0
10
7 (eggs, flour) 2
11
8 (eggs, jam) 1
12
9 (flour, jam) 0
13
10 (bread, cheese, eggs) 1
14
11 (bread, cheese, flour) 0
15
12 (bread, cheese, jam) 0
16
13 (bread, eggs, flour) 2
17
14 (bread, eggs, jam) 1
18
15 (bread, flour, jam) 0
19
16 (cheese, eggs, flour) 0
20
17 (cheese, eggs, jam) 0
21
18 (cheese, flour, jam) 0
22
19 (eggs, flour, jam) 0
23
20 (bread, cheese, eggs, flour) 0
24
21 (bread, cheese, eggs, jam) 0
25
22 (bread, cheese, flour, jam) 0
26
23 (bread, eggs, flour, jam) 0
27
24 (cheese, eggs, flour, jam) 0
28
25 (bread, cheese, eggs, flour, jam) 0
29