I want to select data from different df, how can I speed it up?

Question

I want to take the last data before the specified time from different time intervals df, my code is as follows: On my computer, the running time of get_result_df() is 204ms, how can I speed up the running speed of get_result_df()? I optimized it, and the running time was reduced to 53ms. Is there any room for improvement? Answers to

Accepted Answer

My times are roughly halved, but I see the same behavior. Faster using argmin from np. See below.In [1]: %timeit get_result_df()115 ms ± 3.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)In [2]: %timeit get_result_df2()26.2 ms ± 387 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)Argmin + iloc directly it is faster:def get_result_df3():    global durations, datas, time_selected    t_df = {}    col = ['duration', 'data']    for duration in durations:        df = datas[duration]        dt = df.index.to_numpy()        idx = np.argmin([dt <= time_selected])-1        t_df[duration] = df.iloc[idx][col]    df = pd.DataFrame(t_df[duration] for duration in durations)    return dfIn [2]: %timeit get_result_df3()9.62 ms ± 23.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Advertisement

Answer