I’ve got an array of arrays where I store x,y,z coordinates and a measurement at that coordinate like:
measurements = [[x1,y1,z1,val1],[x2,y2,z2,val2],[...]]
Now before adding a measurement for a certain coordinate I want to check if there is already a measurement for that coordinate. So I can only keep the maximum val measurement.
So the question is:
Is [xn, yn, zn, …] already in measurements
My approach so far would be to iterate over the array and compare with a sclied entry like
for measurement in measurements: if measurement_new[:3] == measurement[:3]: measurement[3] = measurement_new[3] if measurement_new[3] > measurement[3] else measurement[3]
But with the measurements array getting bigger this is very unefficient.
Another approach would be two separate arrays coords = [[x1,y1,z1], [x2,y2,z2], [...]]
and vals = [val1, val2, ...]
This would allow to check for existing coordinates effeciently with [x,y,z] in coords
but would have to merge the arrays later on.
Can you suggest a more efficent method for soving this problem?
Advertisement
Answer
If you want to stick to built-in types (if not see last point in Notes below) I suggest using a dict
for the measurements:
measurements = {(x1,y1,z1): val1, (x2,y2,z2): val2}
Then adding a new value (x,y,z,val
) can simply be:
measurements[(x,y,z)] = max(measurements.get((x,y,z), 0), val)
Notes:
The value
0
inmeasurements.get
is supposed to be the lower bound of the values you are expecting. If you have values below 0 then change it to an appropriate lower bound such that whenever(x,y,z)
is not present in your measuresget
returns the lower bound and thusmax
will returnval
. You can also avoid having to specify the lower bound and write:measurements[(x,y,z)] = max(measurements.get((x,y,z), val), val)
You need to use
tuple
as type for your keys, hence the(x,y,z)
. This is because lists cannot be hashed and so not permitted as keys.Finally, depending on the complexity of the task you are performing, consider using more complex data types. I would recommend having a look at pandas DataFrames they are ideal to deal with such kind of things.