I have a long dataframe with multiple timeseries which are non overlapping.
JavaScript
x
6
1
import numpy as np
2
import pandas as pd
3
df = pd.DataFrame({'id':[1,1,1,1,1,2,2,2,2,2,2],
4
't':[0,1,2,3,4,2,3,4,5,6,7],
5
'price':[10,10.2,10.8,10.1,10.4,142.1,142.8,143.6,142.8,141.4,140.7]})
6
The df looks like this
JavaScript
1
14
14
1
Out[65]:
2
id t price
3
0 1 0 10.0
4
1 1 1 10.2
5
2 1 2 10.8
6
3 1 3 10.1
7
4 1 4 10.4
8
5 2 2 142.1
9
6 2 3 142.8
10
7 2 4 143.6
11
8 2 5 142.8
12
9 2 6 141.4
13
10 2 7 140.7
14
For the time series with id 1, the missing timestamps are 5,6 and 7 and the time series #2 misses timestamps 0 and 1.
I would like to fill the missing dates for all the time series in the dataframe so all of them have all the dates filled with nan:
JavaScript
1
23
23
1
df_target = pd.DataFrame({'id':[1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],
2
't':[0,1,2,3,4,5,6,7,0,1,2,3,4,5,6,7],
3
'price':[10,10.2,10.8,10.1,10.4,np.nan,np.nan,np.nan,np.nan,np.nan,142.1,142.8,143.6,142.8,141.4,140.7]})
4
5
Out[68]:
6
id t price
7
0 1 0 10.0
8
1 1 1 10.2
9
2 1 2 10.8
10
3 1 3 10.1
11
4 1 4 10.4
12
5 1 5 NaN
13
6 1 6 NaN
14
7 1 7 NaN
15
8 2 0 NaN
16
9 2 1 NaN
17
10 2 2 142.1
18
11 2 3 142.8
19
12 2 4 143.6
20
13 2 5 142.8
21
14 2 6 141.4
22
15 2 7 140.7
23
The objective is to be able to then reshape this dataframe to a 3d array. Is there a simple way to fill missing dates for each time series? Thanks
Advertisement
Answer
Use Series.unstack
with DataFrame.stack
:
JavaScript
1
23
23
1
df1 = (df.set_index(['id','t'])['price']
2
.unstack()
3
.stack(dropna=False)
4
.reset_index(name='price'))
5
print (df1)
6
id t price
7
0 1 0 10.0
8
1 1 1 10.2
9
2 1 2 10.8
10
3 1 3 10.1
11
4 1 4 10.4
12
5 1 5 NaN
13
6 1 6 NaN
14
7 1 7 NaN
15
8 2 0 NaN
16
9 2 1 NaN
17
10 2 2 142.1
18
11 2 3 142.8
19
12 2 4 143.6
20
13 2 5 142.8
21
14 2 6 141.4
22
15 2 7 140.7
23