Add missing rows in pandas DataFrame

Question

I have a DataFrame that looks like this: What I want to get is: In short, for each id, add the time rows missing with value 0. How do I do this? I wrote something with a loop, but it's going to be prohibitively slow for my use case which has several million rows Answer Here's one way using groupby.apply

Accepted Answer

Here&#8217;s one way using groupby.apply where we use date_range to add the missing times. Then merge it back to df and fill in the missing values of the other columns:df['time'] = pd.to_datetime(df['time'])out = df.merge(df.groupby('id')['time'].apply(lambda x: pd.date_range(x.iat[0], x.iat[-1], freq='S')).explode(), how='right')out['id'] = out['id'].ffill().astype(int)out['reward'] = out['reward'].fillna(0)Output:    id  reward                time0    1    0.10 2022-04-23 10:00:001    1    0.00 2022-04-23 10:00:012    1    0.00 2022-04-23 10:00:023    1    0.00 2022-04-23 10:00:034    1    0.00 2022-04-23 10:00:045    1    0.15 2022-04-23 10:00:056    1    0.00 2022-04-23 10:00:067    1    0.05 2022-04-23 10:00:078    2    0.25 2022-04-23 12:00:009    2    0.00 2022-04-23 12:00:0110   2    0.00 2022-04-23 12:00:0211   2    0.40 2022-04-23 12:00:0312   3    0.45 2022-04-23 15:00:00

Advertisement

Answer