Skip to content
Advertisement

check if subarray is in array of arrays

I’ve got an array of arrays where I store x,y,z coordinates and a measurement at that coordinate like:

measurements = [[x1,y1,z1,val1],[x2,y2,z2,val2],[...]]

Now before adding a measurement for a certain coordinate I want to check if there is already a measurement for that coordinate. So I can only keep the maximum val measurement.

So the question is:

Is [xn, yn, zn, …] already in measurements

My approach so far would be to iterate over the array and compare with a sclied entry like

for measurement in measurements:
    if measurement_new[:3] == measurement[:3]:
        measurement[3] = measurement_new[3] if measurement_new[3] > measurement[3] else measurement[3]

But with the measurements array getting bigger this is very unefficient.

Another approach would be two separate arrays coords = [[x1,y1,z1], [x2,y2,z2], [...]] and vals = [val1, val2, ...]

This would allow to check for existing coordinates effeciently with [x,y,z] in coords but would have to merge the arrays later on.

Can you suggest a more efficent method for soving this problem?

Advertisement

Answer

If you want to stick to built-in types (if not see last point in Notes below) I suggest using a dict for the measurements:

measurements = {(x1,y1,z1): val1,
                (x2,y2,z2): val2}

Then adding a new value (x,y,z,val) can simply be:

measurements[(x,y,z)] = max(measurements.get((x,y,z), 0), val)

Notes:

  • The value 0 in measurements.get is supposed to be the lower bound of the values you are expecting. If you have values below 0 then change it to an appropriate lower bound such that whenever (x,y,z) is not present in your measures get returns the lower bound and thus max will return val. You can also avoid having to specify the lower bound and write:

    measurements[(x,y,z)] = max(measurements.get((x,y,z), val), val)
    
  • You need to use tuple as type for your keys, hence the (x,y,z). This is because lists cannot be hashed and so not permitted as keys.

  • Finally, depending on the complexity of the task you are performing, consider using more complex data types. I would recommend having a look at pandas DataFrames they are ideal to deal with such kind of things.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement