I have the following code that generates a timeseries with 1 min steps but would like to have the time gaps filled. i.e 13:58 is missing in between. Every ip should be represented in the gap with zero values.
JavaScript
x
26
26
1
flow = {'date': ['2020-11-13 13:57:51','2020-11-13 13:57:51','2020-11-13 13:57:52','2020-11-13 13:59:53','2020-11-13 13:59:54'],
2
'source_ip': ['192.168.1.1','192.168.1.2','10.0.0.1','192.168.1.1','192.168.1.1'],
3
'destination_ip': ['10.0.0.1', '10.0.0.1', '192.168.1.1', '192.168.1.2', '192.168.1.2'],
4
'source_bytes':[5,1,2,3,3]
5
}
6
7
df = pd.DataFrame(flow, columns = ['date', 'source_ip', 'destination_ip', 'source_bytes'])
8
df['date'] = pd.to_datetime(df['date'])
9
10
11
df2 = (df.melt(['date', 'source_bytes'], value_name='ip')
12
.groupby(['ip', pd.Grouper(key='date', freq='1min')])['source_bytes']
13
.agg(['sum','min','mean'])
14
.unstack(fill_value=0)
15
.stack()
16
.reset_index()
17
)
18
print (df2)
19
ip date sum min mean
20
0 10.0.0.1 2020-11-13 13:57:00 8 1 2.666667
21
1 10.0.0.1 2020-11-13 13:59:00 0 0 0.000000
22
2 192.168.1.1 2020-11-13 13:57:00 7 2 3.500000
23
3 192.168.1.1 2020-11-13 13:59:00 6 3 3.000000
24
4 192.168.1.2 2020-11-13 13:57:00 1 1 1.000000
25
5 192.168.1.2 2020-11-13 13:59:00 6 3 3.000000
26
How can this be achieved?
Advertisement
Answer
First change unstack
by first level for DatetimeIndex
, and add DataFrame.asfreq
for add missing minutes:
JavaScript
1
23
23
1
df = pd.DataFrame(flow, columns = ['date', 'source_ip', 'destination_ip', 'source_bytes'])
2
df['date'] = pd.to_datetime(df['date'])
3
4
df2 = (df.melt(['date', 'source_bytes'], value_name='ip')
5
.groupby(['ip', pd.Grouper(key='date', freq='1min')])['source_bytes']
6
.agg(['sum','min','mean'])
7
.unstack(0,fill_value=0)
8
.asfreq('Min', fill_value=0)
9
.stack()
10
.reset_index()
11
)
12
print (df2)
13
date ip sum min mean
14
0 2020-11-13 13:57:00 10.0.0.1 8 1 2.666667
15
1 2020-11-13 13:57:00 192.168.1.1 7 2 3.500000
16
2 2020-11-13 13:57:00 192.168.1.2 1 1 1.000000
17
3 2020-11-13 13:58:00 10.0.0.1 0 0 0.000000
18
4 2020-11-13 13:58:00 192.168.1.1 0 0 0.000000
19
5 2020-11-13 13:58:00 192.168.1.2 0 0 0.000000
20
6 2020-11-13 13:59:00 10.0.0.1 0 0 0.000000
21
7 2020-11-13 13:59:00 192.168.1.1 6 3 3.000000
22
8 2020-11-13 13:59:00 192.168.1.2 6 3 3.000000
23