Resampling with Pandas spline gives strange results. Do I misunderstand, even though the time matches?

Tags: , , ,



I take my dataframe, which is in seconds, and resample it over a period of every n seconds, to properly align all values with even spacing.

The seconds are parsed correctly, but the output results are strange, so maybe I’m completely misunderstanding what exactly is being splined over?

import pandas as pd
from scipy.interpolate import interp1d

df = pd.DataFrame(
        {
            "time": [0., 1.1, 3.3, 4.4, 5.5, 7.7, 9.9, 10.0],
            "floats": [0., 0.1, 0.2, 0.3, 0.4, 0.3, 0.2, 0.0],
            "ints": [0, 1, 1, 1, 1, 0, 0, 1],
        }
    )

df["time"]=pd.to_timedelta(df["time"],unit="s")

df.set_index("time",inplace=True)
df_interpolated = df.resample("2s").interpolate("spline", order=1)

print("Input:")
print(df)

print("Output:")
print(df_interpolated)

f_data_int = interp1d(df.index.astype(int), df["ints"])
interpolated_int = f_data_int(df_interpolated.index.astype(int))

f_data_float = interp1d(df.index.astype(int), df["floats"])
interpolated_float = f_data_float(df_interpolated.index.astype(int))

df_fixed = df_interpolated.copy()

df_fixed["floats"] = interpolated_float
df_fixed["ints"] = interpolated_int#.astype(int)
print("Expected:")
print(df_fixed.round(2))

Gives

Input:
                        floats  ints
time                                
0 days 00:00:00            0.0     0
0 days 00:00:01.100000     0.1     1
0 days 00:00:03.300000     0.2     1
0 days 00:00:04.400000     0.3     1
0 days 00:00:05.500000     0.4     1
0 days 00:00:07.700000     0.3     0
0 days 00:00:09.900000     0.2     0
0 days 00:00:10            0.0     1


Output:
                 floats  ints
time                         
0 days 00:00:00     0.0   0.0
0 days 00:00:02     0.0   0.2
0 days 00:00:04     0.0   0.4
0 days 00:00:06     0.0   0.6
0 days 00:00:08     0.0   0.8
0 days 00:00:10     0.0   1.0


Expected:
                 floats  ints
time                         
0 days 00:00:00    0.00  0.00
0 days 00:00:02    0.14  1.00
0 days 00:00:04    0.26  1.00
0 days 00:00:06    0.38  0.77
0 days 00:00:08    0.29  0.00
0 days 00:00:10    0.00  1.00

So where did my values go in the output?

Answer

When you resample, you lose a lot of data that does not fit the 2s timestep. Therefore, you can’t use it for the interpolation.

import datetime

# upsample with very small steps
# specify interpolation rule if the date does not fit to the timedelta pattern 
# (for example, if your time is like 0.111 -- not a multiple of 0.1)
timedelta = datetime.timedelta(seconds=0.1)
# upsample and interpolate
df_interpolated = df.resample(timedelta, convention='end').interpolate("spline", order=1)
# resample to keep points only at 2s intervals. We don't have missing values, so
# None can be filled out by any method
df_interpolated = df_interpolated.resample('2s').asfreq()


Source: stackoverflow