python: what are efficient techniques to deal with deeply nested data in a flexible manner?

Question

My question is not about a specific code snippet but more general, so please bear with me: How should I organize the data I'm analyzing, and which tools should I use to manage it? I'm using python and numpy to analyse data. Because the python documentation indicates that dictionaries are very optimized in python, and also due to the fact

Accepted Answer

&#8220;I stored it in a deeply nested dictionary&#8221;And, as you&#8217;ve seen, it doesn&#8217;t work out well.What&#8217;s the alternative? Composite keys and a shallow dictionary. You have an 8-part key:( individual, imaging session, Region imaged, timestamp of file, properties of file, regions of interest in image, format of data, channel of acquisition ) which mapsto an array of values.{ ('AS091209M02', '100113', 'R16', '1263399103', 'Responses', 'N01', 'Sequential', 'Ch1' ): array, ...The issue with this is search.Proper class structures.   Actually, a full-up Class definition may be overkill.  &#8220;The type of operations I perform is for instance to compute properties of the arrays   (listed under Ch1, Ch2), pick up arrays to make a new collection, for instance analyze  responses of N01 from region 16 (R16) of a given individual at different time points, etc.&#8221;RecommendationFirst, use a namedtuple for your ultimate object.Array = namedtuple( 'Array', 'individual, session, region, timestamp, properties, roi, format, channel, data' )Or something like that.  Build a simple list of these named tuple objects.  You can then simply iterate over them.Second, use many simple map-reduce operations on this master list of the array objects.  Filtering:for a in theMasterArrrayList:    if a.region = 'R16' and interest = 'N01':        # do something on these items only.Reducing by Common Key:  individual_dict = defaultdict(list)for a in theMasterArrayList:    individual_dict[ a.individual ].append( a )This will create a subset in the map that has exactly the items you want.You can then do indiidual_dict[&#8216;AS091209M02&#8217;] and have all of their data.  You can do this for any (or all) of the available keys.region_dict = defaultdict(list)for a in theMasterArrayList:    region_dict[ a.region ].append( a )This does not copy any data.  It&#8217;s fast and relatively compact in memory.Mapping (or transforming) the array:for a in theMasterArrayList:    someTransformationFunction( a.data )If the array is itself a list, you&#8217;re can update that list without breaking the tuple as a whole.  If you need to create a new array from an existing array, you&#8217;re creating a new tuple.  There&#8217;s nothing wrong with this, but it is a new tuple.  You wind up with programs like this.def region_filter( array_list, region_set ):    for a in array_list:        if a.region in region_set:            yield adef array_map( array_list, someConstant ):    for a in array_list:        yield Array( *(a[:8] + (someTranformation( a.data, someConstant ),) )def some_result( array_list, region, someConstant ):    for a in array_map( region_filter( array_list, region ), someConstant ):        yield aYou can build up transformations, reductions, mappings into more elaborate things.The most important thing is creating only the dictionaries you need from the master list so you don&#8217;t do any more filtering than is minimally necessary.BTW.  This can be mapped to a relational database trivially.  It will be slower, but you can have multiple concurrent update operations.  Except for multiple concurrent updates, a relational database doesn&#8217;t offer any features above this.

Advertisement

Answer