Skip to content
Advertisement

Connect the dots in pandas

TLDR

I want to do the equivilent of an Excel VLOOKUP in pandas. The unique thing about this question is that the exact values I am looking up do not exist. I want to do a linear interpolation to look up the nearest value, so the usual .map approach does not work.

Question

I have a pandas series, with columns x and y.

I have another pandas dataframe, with many different values of x, and I want to map the first dataframe to the second. The problem is that x is continuous. There are many x values in the second dataframe which aren’t in the first. So if I do the usual approach of df2['y'] = df2['x'].apply(df1.set_index('x')['y']), I’ll get key errors (or NaNs). I want to do lookup with interpolation. How do I do that?

MWE

Steps to reproduce:

JavaScript

Current output:

Graph of dots near and on sin wave

Desired Output:

I want the red dots to be shifted vertically, so that they lie on the blue curve.

i.e. replace

JavaScript

with something like:

JavaScript

Advertisement

Answer

IIUC, you have a set of points that come from an underlying function. You have to now interpolate some intermediate points using the same underlying function.

So, points at 0.1 distance come from a given function. You want to now find the approximate values for points at 0.06 distance such that they come from that same underlying function.

Here is what you can do.

  1. Lets assume your 0.1 points come from the function f(x)
  2. Now, lets get points at 0.06 distance with nan values and combine these with the 0.1 points.
  3. Next lets sort all of them by value of x.
  4. Now you have a sequence of points where 0.1 are filled with values from f(x) and 0.06 are filled with Nan.
  5. You can simply use pd.interpolate() to fill the distribution and then separate the 0.06 points.
JavaScript

graph of dots on sin wave

NOTE: The blue points are the 0.1 distance points that come directly from the green function. The red points are the ‘intermediate’ points at 0.06 distance which have to be interpolated. As the curve shows, the interpolation does well.

You can try other methods of interpolation by changing parameter method (maybe try cubic spline!). Check the following link for details.


I dont think pd.merge_asof will solve what you need because its just for mapping based on nearest values –

JavaScript
JavaScript

It doesn’t interpolate from the underlying distribution. It simply maps values and sets them to nearest based on the distance between the 2 x points. So, for 1.6 the value was 0.412215.

However, for values 1.6 to 1.7, all values are now set to 0.412215. If you are using interpolation, it would approximate the values such that 1.61 will have a different value than 1.65 and 1.68.

Hope that makes sense.

Advertisement