I have some combinations like
JavaScript
x
7
1
(A,B) = 1
2
(A,C) = 0
3
(A,D) = 1
4
(B,C) = 1
5
(B,D) = 1
6
(C,D) = 0
7
Any idea on how I efficiently can create a four by four matrix with these 0,1 values from all these combinations? So the result will be something like:
JavaScript
1
6
1
A B C D
2
A - 1 0 1
3
B 1 - 1 1
4
C 0 1 - 0
5
D 1 1 0 -
6
Advertisement
Answer
Imagine if the “combinations” are stored in a file in the following format (or similar):
JavaScript
1
7
1
A,B,1
2
A,C,0
3
A,D,1
4
B,C,1
5
B,D,1
6
C,D,0
7
Then you can do:
JavaScript
1
2
1
df = pd.read_csv(filename, header=None)
2
Example (using your sample data):
JavaScript
1
9
1
txt = """A,B,1
2
A,C,0
3
A,D,1
4
B,C,1
5
B,D,1
6
C,D,0
7
"""
8
df = pd.read_csv(io.StringIO(txt), header=None)
9
Now df
contains:
JavaScript
1
8
1
0 1 2
2
0 A B 1
3
1 A C 0
4
2 A D 1
5
3 B C 1
6
4 B D 1
7
5 C D 0
8
From that point, a little bit of massaging will get you what you want:
JavaScript
1
14
14
1
# all labels (for rows and cols)
2
r = sorted(set(df[0]) | set(df[1]))
3
4
# upper triangular
5
z = (
6
df.set_index([0, 1])
7
.reindex(pd.MultiIndex.from_product([r, r]))
8
.squeeze()
9
.unstack(1)
10
)
11
12
# fill in the lower triangular part to make z symmetric
13
z = z.where(~z.isna(), z.T)
14
We get:
JavaScript
1
7
1
>>> z
2
A B C D
3
A NaN 1.0 0.0 1.0
4
B 1.0 NaN 1.0 1.0
5
C 0.0 1.0 NaN 0.0
6
D 1.0 1.0 0.0 NaN
7
Note: if you prefer to stay in int
-only (and set the diagonal to 0), then:
JavaScript
1
8
1
z = (
2
df.set_index([0, 1])
3
.reindex(pd.MultiIndex.from_product([r, r]), fill_value=0)
4
.squeeze()
5
.unstack(1)
6
)
7
z += z.T
8
and now:
JavaScript
1
7
1
>>> z
2
A B C D
3
A 0 1 0 1
4
B 1 0 1 1
5
C 0 1 0 0
6
D 1 1 0 0
7
For speed
Now, if you know for sure that you are dealing with 4×4 matrices and that the order is exactly as you indicated (ordered by the upper triangle), you can do the following for a faster set up:
JavaScript
1
11
11
1
# get the triangular values, somehow (e.g. read file and discard
2
# all but the last value;
3
4
# here we simply take them from the df above:
5
tri = df[2].values # np.array([1, 0, 1, 1, 1, 0])
6
7
# and now, in pure numpy:
8
z = np.zeros((4,4), dtype=int)
9
z[np.triu_indices(4, 1)] = tri
10
z += z.T
11
The result is a simple numpy
array (no labels):
JavaScript
1
6
1
>>> z
2
[[0 1 0 1]
3
[1 0 1 1]
4
[0 1 0 0]
5
[1 1 0 0]]
6