Skip to content
Advertisement

Resample df to smaller time steps and average the counts

I have a dataframe containing counts over time periods (rainfall in periods of 3 hours), something like this:

time_stamp,           rain_fall_in_mm
2019-01-01 00:03:00,  0.0
2019-01-01 00:06:00,  3.9
2019-01-01 00:09:00,  0.0
2019-01-01 00:12:00,  1.2

I need to upsample the dataframe into time periods of 1 hour and I would like to average out the counts for the rain, so that there are no NaNs and the total sum of rain remains the same, means this is the desired result:

time_stamp,           rain_fall_in_mm
2019-01-01 00:01:00,  0.0
2019-01-01 00:02:00,  0.0
2019-01-01 00:03:00,  0.0
2019-01-01 00:04:00,  1.3
2019-01-01 00:05:00,  1.3
2019-01-01 00:06:00,  1.3
2019-01-01 00:07:00,  0.0
2019-01-01 00:08:00,  0.0
2019-01-01 00:09:00,  0.0
2019-01-01 00:10:00,  0.4
2019-01-01 00:11:00,  0.4
2019-01-01 00:12:00,  0.4

I found that I can do something like series.resample('1H').bfill() or series.resample('1H').pad(). These solve the resampling issue, but don’t fulfil the desired averaging. Do you have any suggestions what to do? Tnx

Advertisement

Answer

First, make sure that your index is in datetime format. If it is not you can do this in the following way:

df.set_index(pd.date_range(start=df.time_stamp[0], periods=len(df), freq='3H'), inplace=True)

Then use this if want to upscale only the one column

df_rain_hourly_column = df.resample('H').bfill().rain / 3.

If your initial df contains only floats you can operate on the whole dataframe

df2 = df.resample('H').bfill() / 3.

The division by 3. (the length factor of old_time_period/new_time_period) is a bit hacky, but I really haven’t found a more general and simple solution anywhere.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement