Construct graph connectivity matrices in COO format

Question

I have faced the following subtask while working with graph data: I need to construct graph connectivity matrices in COO format for graphs with several fully-connected components from arrays of "border" indices. As an example, given array the resulting COO matrix should be That is, borders array contains ranges of nodes that should form fully-connected subgraphs (starting index included and

Accepted Answer

Since your goal is a faster solution than what you have, you can explore itertools for solving this efficiently. This approach benchmarks approximately 25 times faster than your current approach as tested on larger border lists.import numpy as npfrom itertools import product, chaindef get_coo(borders):    edges = chain(*[product(range(*i),repeat=2) for i in zip(borders, borders[1:])])    return list(edges)output = get_coo(borders)## NOTE: Remember to can convert to array and Transpose for all approaches below to get Coo format.np.array(output).Tarray([[0, 0, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],       [0, 1, 0, 1, 2, 3, 4, 2, 3, 4, 2, 3, 4]])Alternate approaches and benchmarks:Note: These have been benchmarked on your current small list as well as on a larger border list as generated by borders = np.arange(300)[np.random.randint(0,2,(300,),dtype=bool)]Disjoint union of complete graphsWhat you are trying to do is essentially combine disjoint complete graphs. Adjacency matrices for such graphs have complete connections for selective items along it&#8217;s diagonal. You can use networkx to solve these.While slower than your current solution, you will find that working on these graphs objects would be much much easier and rewarding than using NumPy to represent graphs.Approach 1:Assuming that nodes are in sequence, calculate the number of nodes in each subgraph as iCreate a complete matrix filled with 1s of the shape i*iCombine the graphs using nx.disjoint_union_allFetch the edges of this graphimport numpy as npimport networkx as nxdef get_coo(borders):    graphs = [nx.from_numpy_matrix(np.ones((i,i))).to_directed() for i in np.diff(borders)]    edges = nx.disjoint_union_all(graphs).edges()    return edges%timeit get_coo(borders)#Small- 277 µs ± 33.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)#Large- 300 ms ± 36.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)Approach 2:Iterate over the rolling 2-gram tuples of the borders using zipCreate nx.complete_graph using ranges(start, end) from these tuplesCombine the graphs using nx.disjoint_union_allFetch the edges of this graphimport numpy as npimport networkx as nxdef get_coo(borders):    graphs = [nx.complete_graph(range(*i),nx.DiGraph()) for i in zip(borders, borders[1:])]    edges = nx.disjoint_union_all(graphs).edges()    return edges%timeit get_coo(borders)#Small- 116 µs ± 13.4 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)#Large- 207 ms ± 35.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)Output is a bit faster than previous but lacks the self-loops on the nodes which would have to be added separatelyUsing itertools.productApproach 3:Iterate over the rolling 2-gram tuples of the borders using zipUse itertools.product completely connected edge list for each subgraphUse itertools.chain to &#8220;append&#8221; the two iteratorsReturn them as edgesimport numpy as npfrom itertools import product, chaindef get_coo(borders):    edges = chain(*[product(range(*i),repeat=2) for i in zip(borders, borders[1:])])    return list(edges)%timeit get_coo(borders)#Small- 3.91 µs ± 787 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)#Large- 183 µs ± 21.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)This approach is approximately 25 times faster than your current approachYour current approach &#8211; benchmarkdef get_coo(borders):    edge_list = []    for s, e in zip(borders, borders[1:]):                # create fully-connected subgraph        arr = np.arange(s, e)        t = np.array(np.meshgrid(arr, arr)).T.reshape(-1, 2)        t = t.T        edge_list.append(t)    edge_list = np.concatenate(edge_list, axis=1)    return edge_list%timeit get_coo(borders)#Small- 95.1 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)#Large- 3.91 ms ± 962 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Construct graph connectivity matrices in COO format

Advertisement

Answer

Alternate approaches and benchmarks:

Disjoint union of complete graphs

Using `itertools.product`

Your current approach – benchmark