I have some combinations like
(A,B) = 1 (A,C) = 0 (A,D) = 1 (B,C) = 1 (B,D) = 1 (C,D) = 0
Any idea on how I efficiently can create a four by four matrix with these 0,1 values from all these combinations? So the result will be something like:
A B C D A - 1 0 1 B 1 - 1 1 C 0 1 - 0 D 1 1 0 -
Advertisement
Answer
Imagine if the “combinations” are stored in a file in the following format (or similar):
A,B,1 A,C,0 A,D,1 B,C,1 B,D,1 C,D,0
Then you can do:
df = pd.read_csv(filename, header=None)
Example (using your sample data):
txt = """A,B,1 A,C,0 A,D,1 B,C,1 B,D,1 C,D,0 """ df = pd.read_csv(io.StringIO(txt), header=None)
Now df
contains:
0 1 2 0 A B 1 1 A C 0 2 A D 1 3 B C 1 4 B D 1 5 C D 0
From that point, a little bit of massaging will get you what you want:
# all labels (for rows and cols) r = sorted(set(df[0]) | set(df[1])) # upper triangular z = ( df.set_index([0, 1]) .reindex(pd.MultiIndex.from_product([r, r])) .squeeze() .unstack(1) ) # fill in the lower triangular part to make z symmetric z = z.where(~z.isna(), z.T)
We get:
>>> z A B C D A NaN 1.0 0.0 1.0 B 1.0 NaN 1.0 1.0 C 0.0 1.0 NaN 0.0 D 1.0 1.0 0.0 NaN
Note: if you prefer to stay in int
-only (and set the diagonal to 0), then:
z = ( df.set_index([0, 1]) .reindex(pd.MultiIndex.from_product([r, r]), fill_value=0) .squeeze() .unstack(1) ) z += z.T
and now:
>>> z A B C D A 0 1 0 1 B 1 0 1 1 C 0 1 0 0 D 1 1 0 0
For speed
Now, if you know for sure that you are dealing with 4×4 matrices and that the order is exactly as you indicated (ordered by the upper triangle), you can do the following for a faster set up:
# get the triangular values, somehow (e.g. read file and discard # all but the last value; # here we simply take them from the df above: tri = df[2].values # np.array([1, 0, 1, 1, 1, 0]) # and now, in pure numpy: z = np.zeros((4,4), dtype=int) z[np.triu_indices(4, 1)] = tri z += z.T
The result is a simple numpy
array (no labels):
>>> z [[0 1 0 1] [1 0 1 1] [0 1 0 0] [1 1 0 0]]