I am stuck in trying to calculate a distance matrix from different binary arrays and I can only use for loops to resolve this…
The problem consists of the following; Imagine I have a binary matrix built with different rows as follows, with dimension n=3,m=3 in this case:
np.matrix([[0,0,0], [1,0,1], [1,1,1]])
And I would like to achieve the following symmetric matrix, by adding the number of different positions on each row:
np.matrix([[0,2,3], [2,0,1], [3,1,0]])
I have been trying to do it by 2 for loops and adding when 2 positions are !=
but I can not achieve to know how to iterate over those vectors properly…
Any help?
Advertisement
Answer
If I understood correctly, you could do:
import numpy as np mat = np.matrix([[0,0,0], [1,0,1], [1,1,1]]) result = np.zeros(np.shape(mat)) nrows, ncols = np.shape(mat) for r in range(nrows): # We only need to compare the upper triangular part of the matrix. for i in range(r+1, nrows): for j in range(ncols): result[r, i] += mat[r, j] != mat[i, j] # Here we copy the upper triangular part to lower triangular to make it symmetric. result = result + result.T print(result)
array([[0, 2, 3], [2, 0, 1], [3, 1, 0]])
If you can at least use some numpy functions:
# You can also iterate matrices row by row. for i, row in enumerate(mat): # Sum differences. mat != row already calculates the differences with the whole matrix. result[i, :] = np.sum(mat != row, axis=1).transpose() print(result)
array([[0, 2, 3], [2, 0, 1], [3, 1, 0]])
In case you want to see a neat trick, here is how you could do it without iterating with a for loop. The following code is using “broadcasting”. We add a dimension to the array so that the comparison is automatically done using each row:
# For this trick we need to convert the matrix to an array. mat_arr = np.asarray(mat) result_broadcasting = np.sum(mat_arr != mat_arr[:, None], axis=2) print(result_broadcasting)
array([[0, 2, 3], [2, 0, 1], [3, 1, 0]])