Skip to content
Advertisement

Equidistant timeseries filling the blanks

I have the following code that generates a timeseries with 1 min steps but would like to have the time gaps filled. i.e 13:58 is missing in between. Every ip should be represented in the gap with zero values.

flow = {'date': ['2020-11-13 13:57:51','2020-11-13 13:57:51','2020-11-13 13:57:52','2020-11-13 13:59:53','2020-11-13 13:59:54'],
        'source_ip': ['192.168.1.1','192.168.1.2','10.0.0.1','192.168.1.1','192.168.1.1'],
        'destination_ip': ['10.0.0.1', '10.0.0.1', '192.168.1.1', '192.168.1.2', '192.168.1.2'],
        'source_bytes':[5,1,2,3,3]
        }

df = pd.DataFrame(flow, columns = ['date', 'source_ip', 'destination_ip', 'source_bytes'])
df['date'] = pd.to_datetime(df['date'])


df2 = (df.melt(['date', 'source_bytes'], value_name='ip')
        .groupby(['ip', pd.Grouper(key='date', freq='1min')])['source_bytes']
        .agg(['sum','min','mean'])
        .unstack(fill_value=0)
        .stack()
        .reset_index()
        )
print (df2)
            ip                date  sum  min      mean
0     10.0.0.1 2020-11-13 13:57:00    8    1  2.666667
1     10.0.0.1 2020-11-13 13:59:00    0    0  0.000000
2  192.168.1.1 2020-11-13 13:57:00    7    2  3.500000
3  192.168.1.1 2020-11-13 13:59:00    6    3  3.000000
4  192.168.1.2 2020-11-13 13:57:00    1    1  1.000000
5  192.168.1.2 2020-11-13 13:59:00    6    3  3.000000

How can this be achieved?

Advertisement

Answer

First change unstack by first level for DatetimeIndex, and add DataFrame.asfreq for add missing minutes:

df = pd.DataFrame(flow, columns = ['date', 'source_ip', 'destination_ip', 'source_bytes'])
df['date'] = pd.to_datetime(df['date'])

df2 = (df.melt(['date', 'source_bytes'], value_name='ip')
        .groupby(['ip', pd.Grouper(key='date', freq='1min')])['source_bytes']
        .agg(['sum','min','mean'])
        .unstack(0,fill_value=0)
        .asfreq('Min', fill_value=0)
        .stack()
        .reset_index()
        )
print (df2)
                 date           ip  sum  min      mean
0 2020-11-13 13:57:00     10.0.0.1    8    1  2.666667
1 2020-11-13 13:57:00  192.168.1.1    7    2  3.500000
2 2020-11-13 13:57:00  192.168.1.2    1    1  1.000000
3 2020-11-13 13:58:00     10.0.0.1    0    0  0.000000
4 2020-11-13 13:58:00  192.168.1.1    0    0  0.000000
5 2020-11-13 13:58:00  192.168.1.2    0    0  0.000000
6 2020-11-13 13:59:00     10.0.0.1    0    0  0.000000
7 2020-11-13 13:59:00  192.168.1.1    6    3  3.000000
8 2020-11-13 13:59:00  192.168.1.2    6    3  3.000000
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement