I take my dataframe, which is in seconds, and resample it over a period of every n seconds, to properly align all values with even spacing.
The seconds are parsed correctly, but the output results are strange, so maybe I’m completely misunderstanding what exactly is being splined over?
import pandas as pd from scipy.interpolate import interp1d df = pd.DataFrame( { "time": [0., 1.1, 3.3, 4.4, 5.5, 7.7, 9.9, 10.0], "floats": [0., 0.1, 0.2, 0.3, 0.4, 0.3, 0.2, 0.0], "ints": [0, 1, 1, 1, 1, 0, 0, 1], } ) df["time"]=pd.to_timedelta(df["time"],unit="s") df.set_index("time",inplace=True) df_interpolated = df.resample("2s").interpolate("spline", order=1) print("Input:") print(df) print("Output:") print(df_interpolated) f_data_int = interp1d(df.index.astype(int), df["ints"]) interpolated_int = f_data_int(df_interpolated.index.astype(int)) f_data_float = interp1d(df.index.astype(int), df["floats"]) interpolated_float = f_data_float(df_interpolated.index.astype(int)) df_fixed = df_interpolated.copy() df_fixed["floats"] = interpolated_float df_fixed["ints"] = interpolated_int#.astype(int) print("Expected:") print(df_fixed.round(2))
Gives
Input: floats ints time 0 days 00:00:00 0.0 0 0 days 00:00:01.100000 0.1 1 0 days 00:00:03.300000 0.2 1 0 days 00:00:04.400000 0.3 1 0 days 00:00:05.500000 0.4 1 0 days 00:00:07.700000 0.3 0 0 days 00:00:09.900000 0.2 0 0 days 00:00:10 0.0 1 Output: floats ints time 0 days 00:00:00 0.0 0.0 0 days 00:00:02 0.0 0.2 0 days 00:00:04 0.0 0.4 0 days 00:00:06 0.0 0.6 0 days 00:00:08 0.0 0.8 0 days 00:00:10 0.0 1.0 Expected: floats ints time 0 days 00:00:00 0.00 0.00 0 days 00:00:02 0.14 1.00 0 days 00:00:04 0.26 1.00 0 days 00:00:06 0.38 0.77 0 days 00:00:08 0.29 0.00 0 days 00:00:10 0.00 1.00
So where did my values go in the output?
Advertisement
Answer
When you resample, you lose a lot of data that does not fit the 2s timestep. Therefore, you can’t use it for the interpolation.
import datetime # upsample with very small steps # specify interpolation rule if the date does not fit to the timedelta pattern # (for example, if your time is like 0.111 -- not a multiple of 0.1) timedelta = datetime.timedelta(seconds=0.1) # upsample and interpolate df_interpolated = df.resample(timedelta, convention='end').interpolate("spline", order=1) # resample to keep points only at 2s intervals. We don't have missing values, so # None can be filled out by any method df_interpolated = df_interpolated.resample('2s').asfreq()