Connect the dots in pandas

Question

TLDR I want to do the equivilent of an Excel VLOOKUP in pandas. The unique thing about this question is that the exact values I am looking up do not exist. I want to do a linear interpolation to look up the nearest value, so the usual .map approach does not work. Question I have a pandas series, with columns

Accepted Answer

IIUC, you have a set of points that come from an underlying function. You have to now interpolate some intermediate points using the same underlying function.So, points at 0.1 distance come from a given function. You want to now find the approximate values for points at 0.06 distance such that they come from that same underlying function.Here is what you can do.Lets assume your 0.1 points come from the function f(x)Now, lets get points at 0.06 distance with nan values and combine these with the 0.1 points.Next lets sort all of them by value of x.Now you have a sequence of points where 0.1 are filled with values from f(x) and 0.06 are filled with Nan.You can simply use pd.interpolate() to fill the distribution and then separate the 0.06 points.x1 = np.arange(0.0, 2.0, 0.1)x2 = np.arange(0.0, 2.0, 0.06)def f(x):    return 1 + np.sin(2 * np.pi * x)df1 = pd.DataFrame({'A':'x1', 'x':x1, 'y':f(x1)})  #dataframe with filled valuesdf2 = pd.DataFrame({'A':'x2', 'x':x2, 'y':np.nan}) #dataframe with nansdf3 = pd.concat([df1, df2]).sort_values('x')  #Vertically combine and sort valuesdf3 = df3.set_index('x').interpolate('index').reset_index()df3 = df3[df3['A'] != 'x1'] # drop the rows which aren't in df2#Plot all 3plt.plot(df1['x'], df1['y'], 'x-', label='reference', c='green')  #original functionplt.scatter(df1['x'], df1['y'], label='original', c='blue')  # points at 0.1plt.scatter(df3['x'], df3['y'], label='fitted', c='red') #interpolated points at 0.06plt.legend()plt.show()NOTE: The blue points are the 0.1 distance points that come directly from the green function. The red points are the &#8216;intermediate&#8217; points at 0.06 distance which have to be interpolated. As the curve shows, the interpolation does well.You can try other methods of interpolation by changing parameter method (maybe try cubic spline!). Check the following link for details.I dont think pd.merge_asof will solve what you need because its just for mapping based on nearest values &#8211;df2 = pd.merge_asof(df1, df2, on='x', direction='nearest')print(df2)      x       y_x       y_y0   0.0  1.000000  1.0000001   0.1  1.587785  1.5877852   0.2  1.951057  1.5877853   0.3  1.951057  1.9510574   0.4  1.587785  1.5877855   0.5  1.000000  1.5877856   0.6  0.412215  1.0000007   0.7  0.048943  0.0489438   0.8  0.048943  0.0489439   0.9  0.412215  0.04894310  1.0  1.000000  1.00000011  1.1  1.587785  1.00000012  1.2  1.951057  1.58778513  1.3  1.951057  1.95105714  1.4  1.587785  1.95105715  1.5  1.000000  1.58778516  1.6  0.412215  0.412215 #<--- Same value mapped!17  1.7  0.048943  0.412215 #<--- Same value mapped!18  1.8  0.048943  0.04894319  1.9  0.412215  0.412215It doesn&#8217;t interpolate from the underlying distribution. It simply maps values and sets them to nearest based on the distance between the 2 x points. So, for 1.6 the value was 0.412215.However, for values 1.6 to 1.7, all values are now set to 0.412215. If you are using interpolation, it would approximate the values such that 1.61 will have a different value than 1.65 and 1.68.Hope that makes sense.

Connect the dots in pandas

TLDR

Question

MWE

Steps to reproduce:

Current output:

Desired Output:

Advertisement

Answer