Consider I have 2 arrays. arr2 will always be longer than arr1. I’d like to drop the values from arr2 to best fit arr1. If arr2 doesn’t have an equal value to arr1 it will need to be the closest value that is still in arr2.
example 1: arr1 = [0, 1, 1, 3, 3, 3, 5] arr2 = [0, 0, 1, 1, 1, 2, 3, 4, 4, 5, 5, 5] output: [0, 1, 1, 2, 3, 4, 5] example 2: arr1 = [0, 1, 2, 3, 4, 5] arr2 = [1, 1, 3, 3, 3, 3, 4, 4, 5, 5] output: [1, 1, 3, 3, 4, 5] # here output[0] must be 1, because arr2 does not have a 0 value example 3: arr1 = [3, 4, 5, 7] arr2 = [0, 0, 2, 2, 2, 5, 8] output: [2, 2, 5, 8]
observing the output. the first value is 0, both have a match at the 0 index position next is 1, the arr2[1]==0, and is skipped because arr2[2]==1 the next value is 1 also here the next value is 2, arr2 does not have a three, 3 values like how arr1 does, and 2,3,4 are used from arr2 to best fit arr1.
How could I get the desired output above from the 2 arrays? This isn’t an interpolation problem. Is there a specific name for this problem? I’m not sure how to look this up.
Advertisement
Answer
find_nearest
stolen from here and slightly modified to your definition of a tie going to the lower of the two possible values.
Using this, each time it finds the nearest, it also removes that value from the possible values.
import numpy as np import math def find_nearest(array,value): idx = np.searchsorted(array, value, side="left") if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) <= math.fabs(value - array[idx])): return array[idx-1] else: return array[idx] def nearest_no_replacement(array1, array2): array = array2.copy() new_arr = [] for x in array1: c = find_nearest(array, x) new_arr.append(c) array.remove(c) return new_arr arr1 = [0, 1, 1, 3, 3, 3, 5] arr2 = [0, 0, 1, 1, 1, 2, 3, 4, 4, 5, 5, 5] result1 = nearest_no_replacement(arr1, arr2) arr1 = [0, 1, 2, 3, 4, 5] arr2 = [1, 1, 3, 3, 3, 3, 4, 4, 5, 5] result2 = nearest_no_replacement(arr1, arr2) arr1 = [3, 4, 5, 7] arr2 = [0, 0, 2, 2, 2, 5, 8] result3 = nearest_no_replacement(arr1, arr2) print(f'r1: {sorted(result1)}', f'r2: {sorted(result2)}', f'r3: {sorted(result3)}', sep='n')
Output:
r1: [0, 1, 1, 2, 3, 4, 5] r2: [1, 1, 3, 3, 4, 5] r3: [2, 2, 5, 8]
Notes:
- Your second variation output is impossible, since there is no
2
in itsarr2
.