I have a panda’s dataframe that is something like this
JavaScript
x
8
1
el1 el2 x el3
2
n m 6 f
3
n2 m2 7 f2
4
.
5
n10 m10 19.3 f10
6
n11 m11 21 f11
7
.
8
The el1, el2, and el3 do not matter at all. I want to find the row with the X nearest to x=20
so I do
JavaScript
1
2
1
min_index=df['x'].sub(20).abs().idxmin()
2
which gives me the index where x=19.3 is
So far so good
Now, the real problem
Imagine I have a dataframe where x goes from 6 to 31 and then again from 7 to 45 and then again from 2 to 27
I want a way to find a list of indexes [idx1, idx2...idxn]
(in the above example n would be 3) with the most approximate values to say for example 20.
How can I do this with pandas and python?
EDIT: An example of the df would be
JavaScript
1
23
23
1
el1 el2 x el3
2
n m 6 f
3
n2 m2 7 f2
4
.
5
n10 m10 19.3 f10
6
n11 m11 21 f11
7
.
8
n20 m20 31 f20
9
n21 m21 7 f21
10
n22 m22 8.1 f22
11
.
12
n29 m29 19.8 f29
13
n30 m30 21 f30
14
15
n35 m35 45 f35
16
n36 m36 2 f36
17
n37 m37 3 f37
18
.
19
n45 m45 19.9 f45
20
n46 m46 22 f46
21
22
n50 m50 27 f50
23
The rows I want are the ones where x=19.3 x=19.8 x=x=19.9
Advertisement
Answer
Assuming you have consecutive stretches of increasing values and want to find the closest to 20 for each:
JavaScript
1
4
1
group = df['x'].diff().lt(0).cumsum()
2
3
out = df.loc[df['x'].sub(20).abs().groupby(group).idxmin()]
4
example input:
JavaScript
1
8
1
import numpy as np
2
df = pd.DataFrame({'x': np.r_[np.linspace(6,31,5),
3
np.linspace(7,45,5),
4
np.linspace(2,27,5),
5
],
6
'el1': '-'
7
})
8
output:
JavaScript
1
5
1
x el1
2
2 18.50 -
3
6 16.50 -
4
13 20.75 -
5