Skip to content
Advertisement

Find match of numeric arrays Python

Consider I have 2 arrays. arr2 will always be longer than arr1. I’d like to drop the values from arr2 to best fit arr1. If arr2 doesn’t have an equal value to arr1 it will need to be the closest value that is still in arr2.

example 1:
arr1 = [0, 1, 1, 3, 3, 3, 5]
arr2 = [0, 0, 1, 1, 1, 2, 3, 4, 4, 5, 5, 5]
output: [0, 1, 1, 2, 3, 4, 5]

example 2:
arr1 = [0, 1, 2, 3, 4, 5]
arr2 = [1, 1, 3, 3, 3, 3, 4, 4, 5, 5]
output: [1, 1, 3, 3, 4, 5]
# here output[0] must be 1, because arr2 does not have a 0 value

example 3:
arr1 = [3, 4, 5, 7]
arr2 = [0, 0, 2, 2, 2, 5, 8]
output: [2, 2, 5, 8]

observing the output. the first value is 0, both have a match at the 0 index position next is 1, the arr2[1]==0, and is skipped because arr2[2]==1 the next value is 1 also here the next value is 2, arr2 does not have a three, 3 values like how arr1 does, and 2,3,4 are used from arr2 to best fit arr1.

How could I get the desired output above from the 2 arrays? This isn’t an interpolation problem. Is there a specific name for this problem? I’m not sure how to look this up.

Advertisement

Answer

find_nearest stolen from here and slightly modified to your definition of a tie going to the lower of the two possible values.

Using this, each time it finds the nearest, it also removes that value from the possible values.

import numpy as np
import math

def find_nearest(array,value):
    idx = np.searchsorted(array, value, side="left")
    if idx > 0 and (idx == len(array) or math.fabs(value - array[idx-1]) <= math.fabs(value - array[idx])):
        return array[idx-1]
    else:
        return array[idx]

def nearest_no_replacement(array1, array2):
    array = array2.copy()
    new_arr = []
    for x in array1:
        c = find_nearest(array, x)
        new_arr.append(c)
        array.remove(c)
    return new_arr

arr1 = [0, 1, 1, 3, 3, 3, 5]
arr2 = [0, 0, 1, 1, 1, 2, 3, 4, 4, 5, 5, 5]
result1 = nearest_no_replacement(arr1, arr2)

arr1 = [0, 1, 2, 3, 4, 5]
arr2 = [1, 1, 3, 3, 3, 3, 4, 4, 5, 5]
result2 = nearest_no_replacement(arr1, arr2)

arr1 = [3, 4, 5, 7]
arr2 = [0, 0, 2, 2, 2, 5, 8]
result3 = nearest_no_replacement(arr1, arr2)

print(f'r1: {sorted(result1)}', f'r2: {sorted(result2)}', f'r3: {sorted(result3)}', sep='n')

Output:

r1: [0, 1, 1, 2, 3, 4, 5]
r2: [1, 1, 3, 3, 4, 5]
r3: [2, 2, 5, 8]

Notes:

  • Your second variation output is impossible, since there is no 2 in its arr2.
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement