Resampling timestamps in a CSV

Question

I have a CSV file that stores data from different smartphone sensors. The timestamps are elapsed nanoseconds since the program to record the data was started. Short example: The time steps between the timestamps are not equal, but I would like them to be. My question is how to achieve this? I was thinking about simply downsampling the nanoseconds to

Accepted Answer

When you resample over milliseconds, there aren&#8217;t enough values to fill consecutive buckets, so you end up with NaN&#8217;s.If you want your timesteps to be equal while also having all buckets filled, you can find the maximum difference and use that as the resampling rate:First, set the index to be Timedelta&#8216;s, since it&#8217;s the time elapsed since the app started.df.index = df.index.map(lambda t: pd.Timedelta(t, unit='ns'))df.index# output:TimedeltaIndex(['0 days 00:00:00.025993266', '0 days 00:00:00.028129496',                '0 days 00:00:00.031028666', '0 days 00:00:00.033164897',                '0 days 00:00:00.036064067', '0 days 00:00:00.038200297',                '0 days 00:00:00.041099467', '0 days 00:00:00.043235697',                '0 days 00:00:00.046134867'],               dtype='timedelta64[ns]', name='timestamps', freq=None)Next, resampling:import numpy as npmax_diff = np.diff(df.index).max()# numpy.timedelta64(2899170,'ns')# convert to pandas.Timedelta to use it with `resample`dfr = df.resample(pd.Timedelta(max_diff)).mean()dfrOutput:                             acce_x    acce_y    acce_z    grav_x    grav_y    grav_ztimestamps                                                                           0 days 00:00:00.025993266 -2.529037  6.918060  4.340012 -2.888277  7.903406  5.0349980 days 00:00:00.028892436 -2.537415  6.931229  4.605766 -2.850200  7.807237  5.2051720 days 00:00:00.031791606 -2.545792  6.944397  4.871521 -2.796879  7.735252  5.3374350 days 00:00:00.034690776 -2.472771  6.912071  5.180374 -2.743558  7.663267  5.4696990 days 00:00:00.037589946 -2.399750  6.879746  5.489227 -2.664888  7.592961  5.6024060 days 00:00:00.040489116 -2.187862  6.843834  5.941738 -2.544471  7.490385  5.7940850 days 00:00:00.043388286 -1.990341  6.810318  6.321220 -2.419225  7.393572  5.971000And to verify that your index is evenly spaced, it has freq='2899170N':dfr.index# output:TimedeltaIndex(['0 days 00:00:00.025993266', '0 days 00:00:00.028892436',                '0 days 00:00:00.031791606', '0 days 00:00:00.034690776',                '0 days 00:00:00.037589946', '0 days 00:00:00.040489116',                '0 days 00:00:00.043388286'],               dtype='timedelta64[ns]', name='timestamps', freq='2899170N')Or check via diff:np.diff(dfr.index)# output:array([2899170, 2899170, 2899170, 2899170, 2899170, 2899170],      dtype='timedelta64[ns]')

Advertisement

Answer