I have a data frame consisting of lists as elements. I want to find the closest matching values within a percentage of a given value. My code:
JavaScript
x
11
11
1
df = pd.DataFrame({'A':[[1,2],[4,5,6]]})
2
df
3
A
4
0 [1, 2]
5
1 [3, 5, 7]
6
7
# in each row, lets find a the values and their index that match 5 with 20% tolerance
8
val = 5
9
tol = 0.2 # find values matching 5 or 20% within 5 (4 or 6)
10
df['Matching_index'] = (df['A'].map(np.array)-val).map(abs).map(np.argmin)
11
Present solution:
JavaScript
1
5
1
df
2
A Matching_index
3
0 [1, 2] 1 # 2 matches closely with 5 but this is wrong
4
1 [4, 5, 6] 1 # 5 matches with 5, correct.
5
Expected solution:
JavaScript
1
5
1
df
2
A Matching_index
3
0 [1, 2] NaN # No matching value, hence NaN
4
1 [4, 5, 6] 1 # 5 matches with 5, correct.
5
Advertisement
Answer
Idea is get difference with val
and then replace to missing values if not match tolerance, last get np.nanargmin
which raise error if all missing values, so added next condition with np.any
:
JavaScript
1
12
12
1
def f(x):
2
a = np.abs(np.array(x)-val)
3
m = a <= val * tol
4
return np.nanargmin(np.where(m, a, np.nan)) if m.any() else np.nan
5
6
df['Matching_index'] = df['A'].map(f)
7
8
print (df)
9
A Matching_index
10
0 [1, 2] NaN
11
1 [4, 5, 6] 1.0
12
Pandas solution:
JavaScript
1
4
1
df1 = pd.DataFrame(df['A'].tolist(), index=df.index).sub(val).abs()
2
3
df['Matching_index'] = df1.where(df1 <= val * tol).dropna(how='all').idxmin(axis=1)
4