I’d like to create a third column as a result of a cross join between my Columns A
and B
:
JavaScript
x
11
11
1
import pandas as pd
2
import numpy as np
3
df = pd.read_csv("data.csv", sep=",")
4
df
5
# A B
6
# 0 0 Yes
7
# 1 8 No
8
# 2 2 Yes
9
# 3 4 Maybe
10
# 4 6 NA
11
They have the following unique values:
JavaScript
1
17
17
1
>>> df['A'].drop_duplicates()
2
0 0
3
2 8
4
41 4
5
119 2
6
1246 3
7
1808 1
8
Name: A, dtype: int64
9
10
>>> df['B'].drop_duplicates()
11
12
0 NA
13
2 Maybe
14
320 No
15
5575 Yes
16
Name: B, dtype: object
17
I’d like to have a df[‘C’] with the combination of all cross joins, thus we should have 6 * 4 = 24 unique values in it:
JavaScript
1
9
1
#Column C should have 6 * 4 classes:
2
3
(1,Yes)=1 (1,No)=6 (1, Maybe)=12 (1, NA)=18
4
(2,Yes)=2 (2,No)=7 (2, maybe)=13
5
(3,Yes)=3 (3,No)=8
6
(4,Yes)=4 (4,No)=9
7
(8,Yes)=5
8
(0,Yes)=0
9
Thus we should have the following:
JavaScript
1
8
1
Newdf
2
# A B C
3
# 0 0 Yes 0
4
# 1 8 No 9
5
# 2 2 Yes 2
6
# 3 4 Maybe 15
7
# 4 8 NA 22
8
Using this method, I have the following error:
JavaScript
1
2
1
out = df.merge(df[['B']].drop_duplicates().merge(df['A'].drop_duplicates(),how='cross').assign(C=lambda x : x.index+1))
2
Throws:
JavaScript
1
3
1
"No common columns to perform merge on. "
2
pandas.errors.MergeError: No common columns to perform merge on. Merge options: left_on=None, right_on=None, left_index=False, right_index=False
3
Any help would be appreciated.
Advertisement
Answer
Why don’t you use the good old itertools:
JavaScript
1
6
1
from itertools import product
2
cats = list(product(df['A'].unique(), df['B'].unique()))
3
4
# merge with this
5
pd.DataFrame(cats, columns=['A','B']).assign(C=range(len(cats)))
6
Output for the example data:
JavaScript
1
22
22
1
A B C
2
0 0 Yes 0
3
1 0 No 1
4
2 0 Maybe 2
5
3 0 NaN 3
6
4 8 Yes 4
7
5 8 No 5
8
6 8 Maybe 6
9
7 8 NaN 7
10
8 2 Yes 8
11
9 2 No 9
12
10 2 Maybe 10
13
11 2 NaN 11
14
12 4 Yes 12
15
13 4 No 13
16
14 4 Maybe 14
17
15 4 NaN 15
18
16 6 Yes 16
19
17 6 No 17
20
18 6 Maybe 18
21
19 6 NaN 19
22