Skip to content
Advertisement

Construct graph connectivity matrices in COO format

I have faced the following subtask while working with graph data:

I need to construct graph connectivity matrices in COO format for graphs with several fully-connected components from arrays of “border” indices.

As an example, given array

borders = [0, 2, 5]

the resulting COO matrix should be

coo_matrix = [[0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
              [0, 1, 0, 1, 2, 3, 4, 2, 3, 4, 2, 3, 4]].

That is, borders array contains ranges of nodes that should form fully-connected subgraphs (starting index included and ending index excluded).

I came up with the following algorithm, but I suspect that the perfomance could be improved:

import numpy as np

def get_coo(borders):

    edge_list = []
    for s, e in zip(borders, borders[1:]):
        
        # create fully-connected subgraph
        arr = np.arange(s, e)
        t = np.array(np.meshgrid(arr, arr)).T.reshape(-1, 2)
        t = t.T

        edge_list.append(t)

    edge_list = np.concatenate(edge_list, axis=1)

    return edge_list

I feel there may be a faster solution, maybe using some numpy vectorized operations.

Does anyone have any ideas?

Advertisement

Answer

Since your goal is a faster solution than what you have, you can explore itertools for solving this efficiently. This approach benchmarks approximately 25 times faster than your current approach as tested on larger border lists.

import numpy as np
from itertools import product, chain

def get_coo(borders):
    edges = chain(*[product(range(*i),repeat=2) for i in zip(borders, borders[1:])])
    return list(edges)

output = get_coo(borders)

## NOTE: Remember to can convert to array and Transpose for all approaches below to get Coo format.
np.array(output).T
array([[0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
       [0, 1, 0, 1, 2, 3, 4, 2, 3, 4, 2, 3, 4]])

Alternate approaches and benchmarks:

Note: These have been benchmarked on your current small list as well as on a larger border list as generated by borders = np.arange(300)[np.random.randint(0,2,(300,),dtype=bool)]

Disjoint union of complete graphs

What you are trying to do is essentially combine disjoint complete graphs. Adjacency matrices for such graphs have complete connections for selective items along it’s diagonal. You can use networkx to solve these.

enter image description here

While slower than your current solution, you will find that working on these graphs objects would be much much easier and rewarding than using NumPy to represent graphs.

Approach 1:

  1. Assuming that nodes are in sequence, calculate the number of nodes in each subgraph as i
  2. Create a complete matrix filled with 1s of the shape i*i
  3. Combine the graphs using nx.disjoint_union_all
  4. Fetch the edges of this graph
import numpy as np
import networkx as nx

def get_coo(borders):
    graphs = [nx.from_numpy_matrix(np.ones((i,i))).to_directed() for i in np.diff(borders)]
    edges = nx.disjoint_union_all(graphs).edges()
    return edges

%timeit get_coo(borders)

#Small- 277 µs ± 33.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
#Large- 300 ms ± 36.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Approach 2:

  1. Iterate over the rolling 2-gram tuples of the borders using zip
  2. Create nx.complete_graph using ranges(start, end) from these tuples
  3. Combine the graphs using nx.disjoint_union_all
  4. Fetch the edges of this graph
import numpy as np
import networkx as nx

def get_coo(borders):
    graphs = [nx.complete_graph(range(*i),nx.DiGraph()) for i in zip(borders, borders[1:])]
    edges = nx.disjoint_union_all(graphs).edges()
    return edges

%timeit get_coo(borders)

#Small- 116 µs ± 13.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#Large- 207 ms ± 35.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Output is a bit faster than previous but lacks the self-loops on the nodes which would have to be added separately

Using itertools.product

Approach 3:

  1. Iterate over the rolling 2-gram tuples of the borders using zip
  2. Use itertools.product completely connected edge list for each subgraph
  3. Use itertools.chain to “append” the two iterators
  4. Return them as edges
import numpy as np
from itertools import product, chain

def get_coo(borders):
    edges = chain(*[product(range(*i),repeat=2) for i in zip(borders, borders[1:])])
    return list(edges)

%timeit get_coo(borders)

#Small- 3.91 µs ± 787 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
#Large- 183 µs ± 21.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This approach is approximately 25 times faster than your current approach

Your current approach – benchmark

def get_coo(borders):

    edge_list = []
    for s, e in zip(borders, borders[1:]):
        
        # create fully-connected subgraph
        arr = np.arange(s, e)
        t = np.array(np.meshgrid(arr, arr)).T.reshape(-1, 2)
        t = t.T

        edge_list.append(t)

    edge_list = np.concatenate(edge_list, axis=1)

    return edge_list

%timeit get_coo(borders)

#Small- 95.1 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
#Large- 3.91 ms ± 962 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement