Is there a simple way to remove “padding” fields from numpy.dtype.descr?

Question

Context Since numpy version 1.16, if you access multiple fields of a structured array, the dtype of the resulting array will have the same item size as the original one, leading to extra "padding": The new behavior as of Numpy 1.16 leads to extra “padding” bytes at the location of unindexed fields compared to 1.15. You will need to update

Accepted Answer

In [237]: a = np.array(     ...:     [     ...:         (10.0, 13.5, 1248, -2),     ...:         (20.0, 0.0, 0, 0),     ...:         (30.0, 0.0, 0, 0),     ...:         (40.0, 0.0, 0, 0),     ...:         (50.0, 0.0, 0, 999)     ...:     ], dtype=[('x', '<f8'), ('y', '<f8'), ('i', '<i8'), ('j', '<i8')]     ...:     )In [238]: aOut[238]: array([(10., 13.5, 1248,  -2), (20.,  0. ,    0,   0),       (30.,  0. ,    0,   0), (40.,  0. ,    0,   0),       (50.,  0. ,    0, 999)],      dtype=[('x', '<f8'), ('y', '<f8'), ('i', '<i8'), ('j', '<i8')])the b view:In [240]: b = a[['x','i']]In [241]: bOut[241]: array([(10., 1248), (20.,    0), (30.,    0), (40.,    0), (50.,    0)],      dtype={'names':['x','i'], 'formats':['<f8','<i8'], 'offsets':[0,16], 'itemsize':32})the repacked copy:In [243]: c = rf.repack_fields(b)In [244]: cOut[244]: array([(10., 1248), (20.,    0), (30.,    0), (40.,    0), (50.,    0)],      dtype=[('x', '<f8'), ('i', '<i8')])In [245]: c.dtypeOut[245]: dtype([('x', '<f8'), ('i', '<i8')])your overly padded attempt at adding a field:In [247]: d = np.empty(b.shape, dtype=b.dtype.descr + [('c', 'i4')])     ...: d[list(b.dtype.names)] = b     ...: d['c'] = 1In [248]: dOut[248]: array([(10., b'x00x00x00x00x00x00x00x00', 1248, b'x00x00x00x00x00x00x00x00', 1),       (20., b'x00x00x00x00x00x00x00x00',    0, b'x00x00x00x00x00x00x00x00', 1),       ...],      dtype=[('x', '<f8'), ('f1', 'V8'), ('i', '<i8'), ('f3', 'V8'), ('c', '<i4')])My first attempt at making a dtype that does not include the Void fields.  I don&#8217;t know simply testing for V is robust enough:In [253]: [des for des in b.dtype.descr if not 'V' in des[1]]Out[253]: [('x', '<f8'), ('i', '<i8')]And make a new dtype from that:In [254]: d_dtype = _ + [('c','i4')]All of this is normal python list and tuple manipulation. I&#8217;ve seen that in other recfunctions.  I suspect repack_fields does something like this.Now we make a new array with the simpler dtype:In [255]: d = np.empty(b.shape, dtype=d_dtype)In [256]: d[list(b.dtype.names)] = b     ...: d['c'] = 1In [257]: dOut[257]: array([(10., 1248, 1), (20.,    0, 1), (30.,    0, 1), (40.,    0, 1),       (50.,    0, 1)], dtype=[('x', '<f8'), ('i', '<i8'), ('c', '<i4')])I&#8217;ve extracted from repack_fields the code that constructs a new, un-padded, dtype:In [262]: def foo(a):     ...:     fieldinfo = []     ...:     for name in a.names:     ...:         tup = a.fields[name]     ...:         fmt = tup[0]     ...:         if len(tup) == 3:     ...:             name = (tup[2], name)     ...:         fieldinfo.append((name, fmt))     ...:     print(fieldinfo)     ...:     dt = np.dtype(fieldinfo)     ...:     return dt     ...:      ...: In [263]: foo(b.dtype)[('x', dtype('float64')), ('i', dtype('int64'))]Out[263]: dtype([('x', '<f8'), ('i', '<i8')])This works from dtype.fields rather than the dtype.descr.  One&#8217;s a dict the other a list.In [274]: b.dtypeOut[274]: dtype({'names':['x','i'], 'formats':['<f8','<i8'], 'offsets':[0,16], 'itemsize':32})In [275]: b.dtype.descrOut[275]: [('x', '<f8'), ('', '|V8'), ('i', '<i8'), ('', '|V8')]In [276]: b.dtype.fieldsOut[276]: mappingproxy({'x': (dtype('float64'), 0), 'i': (dtype('int64'), 16)})In [277]: b.dtype.fields['x']Out[277]: (dtype('float64'), 0)another way of getting just the valid descr tuples from b.dtype:In [278]: [des for des in b.dtype.descr if des[0] in b.dtype.names]Out[278]: [('x', '<f8'), ('i', '<i8')]

Context

Question

Advertisement

Answer