how to use fromiter and ndnumerate together

Question

I'm currently trying to manually implement a function to represent the KNN graph of a set of points as an incidence matrix, and my idea was to take the rows of an affinity matrix(n x n matrix representing the distance between the n points), enumerate and sort them, then return indices for the first K elements the errors I get

Accepted Answer

ndenumerate produces, for each element, a indexing tuple and the value.In [163]: x = np.arange(6)In [164]: list(np.ndenumerate(x))Out[164]: [((0,), 0), ((1,), 1), ((2,), 2), ((3,), 3), ((4,), 4), ((5,), 5)]That makes more sense when the array is 2d or more.  The indexing tuples will have 2 or more values:In [165]: list(np.ndenumerate(x.reshape(3,2)))Out[165]: [((0, 0), 0), ((0, 1), 1), ((1, 0), 2), ((1, 1), 3), ((2, 0), 4), ((2, 1), 5)]With &#8216;plain&#8217; enumerate, you get a 2 element tuple:In [166]: list(enumerate(x))Out[166]: [(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5)]With fromiter and the compound dtype:In [167]: np.fromiter(enumerate(x), dtype=np.dtype("i,f"))Out[167]:     array([(0, 0.), (1, 1.), (2, 2.), (3, 3.), (4, 4.), (5, 5.)],          dtype=[('f0', '<i4'), ('f1', '<f4')])The `dtype` shows the full specification that your short hand produces.  With that dtype, you get a structured array, which can be accessed field by field:    In [169]: _['f0'], _['f1']    Out[169]:     (array([0, 1, 2, 3, 4, 5], dtype=int32),     array([0., 1., 2., 3., 4., 5.], dtype=float32))I've never seen `fromiter` used with `enumerate`.  Admittedly `enumerate/ndenumerate` are generators, and `fromiter` is supposed to be the better way of creating an array from generators.  Let's try some times:    In [170]: y = np.random.rand(10000)    In [171]: timeit np.fromiter(enumerate(y), dtype=np.dtype("i,f"))    2.39 ms ± 68.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)        In [172]: timeit list(enumerate(y))    1.37 ms ± 41.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)Just 'listing' the generator is faster.  `ndenumerate` is slower.        In [173]: timeit list(np.ndenumerate(y))    4.58 ms ± 383 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)    But if your goal is an array, not a just a list, then `fromiter` is faster:    In [174]: timeit np.array(list(enumerate(y)))    9.99 ms ± 557 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)I can't find the source code for `ndenumerate` - it's buried in some file redirections), but I suspect it uses `ndindex` to create the indexing tuples, and then makes a new tuple from that plus the value:    In [179]: list(np.ndindex(x.shape))    Out[179]: [(0,), (1,), (2,), (3,), (4,), (5,)]        In [180]: list(np.ndindex(3,2))    Out[180]: [(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1)]For a 1d array, it's easy to create index - `np.arange(x.shape[0])`.  For higher dimensions, `meshgrid`, `mgrid` etc can generate all the indexing arrays.editFor a 1d array, this function produces the same structured array as your fromiterdef foo(x):    n = x.shape[0]    res = np.empty(n, 'i,f')    res['f0'] = np.arange(n)    res['f1'] = x    return resIn [216]: foo(x)Out[216]: array([(0, 0.), (1, 1.), (2, 2.), (3, 3.), (4, 4.), (5, 5.)],      dtype=[('f0', '<i4'), ('f1', '<f4')])In [217]: foo(y)Out[217]: array([(   0, 0.08351453), (   1, 0.86144197), (   2, 0.6635565 ), ...,       (9997, 0.52427566), (9998, 0.7808558 ), (9999, 0.5060718 )],      dtype=[('f0', '<i4'), ('f1', '<f4')])In [218]: timeit foo(y)51.8 µs ± 1.66 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Advertisement

Answer

edit