I have the following code that generates a timeseries with 1 min steps but would like to have the time gaps filled. i.e 13:58 is missing in between. Every ip should be represented in the gap with zero values.
flow = {'date': ['2020-11-13 13:57:51','2020-11-13 13:57:51','2020-11-13 13:57:52','2020-11-13 13:59:53','2020-11-13 13:59:54'], 'source_ip': ['192.168.1.1','192.168.1.2','10.0.0.1','192.168.1.1','192.168.1.1'], 'destination_ip': ['10.0.0.1', '10.0.0.1', '192.168.1.1', '192.168.1.2', '192.168.1.2'], 'source_bytes':[5,1,2,3,3] } df = pd.DataFrame(flow, columns = ['date', 'source_ip', 'destination_ip', 'source_bytes']) df['date'] = pd.to_datetime(df['date']) df2 = (df.melt(['date', 'source_bytes'], value_name='ip') .groupby(['ip', pd.Grouper(key='date', freq='1min')])['source_bytes'] .agg(['sum','min','mean']) .unstack(fill_value=0) .stack() .reset_index() ) print (df2) ip date sum min mean 0 10.0.0.1 2020-11-13 13:57:00 8 1 2.666667 1 10.0.0.1 2020-11-13 13:59:00 0 0 0.000000 2 192.168.1.1 2020-11-13 13:57:00 7 2 3.500000 3 192.168.1.1 2020-11-13 13:59:00 6 3 3.000000 4 192.168.1.2 2020-11-13 13:57:00 1 1 1.000000 5 192.168.1.2 2020-11-13 13:59:00 6 3 3.000000
How can this be achieved?
Advertisement
Answer
First change unstack
by first level for DatetimeIndex
, and add DataFrame.asfreq
for add missing minutes:
df = pd.DataFrame(flow, columns = ['date', 'source_ip', 'destination_ip', 'source_bytes']) df['date'] = pd.to_datetime(df['date']) df2 = (df.melt(['date', 'source_bytes'], value_name='ip') .groupby(['ip', pd.Grouper(key='date', freq='1min')])['source_bytes'] .agg(['sum','min','mean']) .unstack(0,fill_value=0) .asfreq('Min', fill_value=0) .stack() .reset_index() ) print (df2) date ip sum min mean 0 2020-11-13 13:57:00 10.0.0.1 8 1 2.666667 1 2020-11-13 13:57:00 192.168.1.1 7 2 3.500000 2 2020-11-13 13:57:00 192.168.1.2 1 1 1.000000 3 2020-11-13 13:58:00 10.0.0.1 0 0 0.000000 4 2020-11-13 13:58:00 192.168.1.1 0 0 0.000000 5 2020-11-13 13:58:00 192.168.1.2 0 0 0.000000 6 2020-11-13 13:59:00 10.0.0.1 0 0 0.000000 7 2020-11-13 13:59:00 192.168.1.1 6 3 3.000000 8 2020-11-13 13:59:00 192.168.1.2 6 3 3.000000