Skip to content

create a matrix from combinations with values

I have some combinations like

(A,B) = 1
(A,C) = 0
(A,D) = 1
(B,C) = 1
(B,D) = 1
(C,D) = 0

Any idea on how I efficiently can create a four by four matrix with these 0,1 values from all these combinations? So the result will be something like:

  A B C D
A - 1 0 1
B 1 - 1 1
C 0 1 - 0
D 1 1 0 -

Answer

Imagine if the “combinations” are stored in a file in the following format (or similar):

A,B,1
A,C,0
A,D,1
B,C,1
B,D,1
C,D,0

Then you can do:

df = pd.read_csv(filename, header=None)

Example (using your sample data):

txt = """A,B,1
A,C,0
A,D,1
B,C,1
B,D,1
C,D,0
"""
df = pd.read_csv(io.StringIO(txt), header=None)

Now df contains:

   0  1  2
0  A  B  1
1  A  C  0
2  A  D  1
3  B  C  1
4  B  D  1
5  C  D  0

From that point, a little bit of massaging will get you what you want:

# all labels (for rows and cols)
r = sorted(set(df[0]) | set(df[1]))

# upper triangular
z = (
    df.set_index([0, 1])
    .reindex(pd.MultiIndex.from_product([r, r]))
    .squeeze()
    .unstack(1)
)

# fill in the lower triangular part to make z symmetric
z = z.where(~z.isna(), z.T)

We get:

>>> z
     A    B    C    D
A  NaN  1.0  0.0  1.0
B  1.0  NaN  1.0  1.0
C  0.0  1.0  NaN  0.0
D  1.0  1.0  0.0  NaN

Note: if you prefer to stay in int-only (and set the diagonal to 0), then:

z = (
    df.set_index([0, 1])
    .reindex(pd.MultiIndex.from_product([r, r]), fill_value=0)
    .squeeze()
    .unstack(1)
)
z += z.T

and now:

>>> z
   A  B  C  D
A  0  1  0  1
B  1  0  1  1
C  0  1  0  0
D  1  1  0  0

For speed

Now, if you know for sure that you are dealing with 4×4 matrices and that the order is exactly as you indicated (ordered by the upper triangle), you can do the following for a faster set up:

# get the triangular values, somehow (e.g. read file and discard
# all but the last value;

# here we simply take them from the df above:
tri = df[2].values  # np.array([1, 0, 1, 1, 1, 0])

# and now, in pure numpy:
z = np.zeros((4,4), dtype=int)
z[np.triu_indices(4, 1)] = tri
z += z.T

The result is a simple numpy array (no labels):

>>> z
[[0 1 0 1]
 [1 0 1 1]
 [0 1 0 0]
 [1 1 0 0]]