Skip to content
Advertisement

pandas group by and fill in the missing time interval sequence

I have a data frame like as shown below

df = pd.DataFrame({'person_id': [11,11,11,21,21,21,31,31,31,31,31],
                   'time' :[-1,5,17,11,25,39,46,4,100,150,1],
                   'value':[101,102,121,120,143,153,160,170,96,97,99]})

What I would like to do is

a) FIll in the missing time by generating a sequence number (ex:1,2,3,4) and copy the value (for all other columns) from the previous row

I was trying something like below

df.groupby(['person_id']).cumcount() + 1
df['sequence'] = g.cumcount() + 1

But this doesn’t help me get the expected output

I expect my output to be like as shown below (SAMPLE OF 1 SUBJECT IS SHOWN BELOW)

enter image description here

Advertisement

Answer

Let’s set the time column as the index of dataframe then groupby the dataframe on person_id then for each group classified by person_id reindex the group to conform its index with the range of values specified in time column, finally concat all the groups to get the desired dataframe:

grp = df.set_index('time').groupby('person_id')
groups = [g.reindex(range(g.index.min(), g.index.max() + 1)).ffill().reset_index() for _, g in grp]
out = pd.concat(groups, ignore_index=True).reindex(df.columns, axis=1)

Alternatively you can first create tuple pairs for each person_id and corresponding range of values specified in time column then reindex the dataframe:

grp = df.groupby('person_id')['time']
idx = [(k, n) for k, t in grp  for n in range(t.min(), t.max() + 1)]
out = df.set_index(['person_id', 'time']).reindex(idx).ffill().reset_index()

Result (for person_id 11):

    person_id  time  value
0        11.0    -1  101.0
1        11.0     0  101.0
2        11.0     1  101.0
3        11.0     2  101.0
4        11.0     3  101.0
5        11.0     4  101.0
6        11.0     5  102.0
7        11.0     6  102.0
8        11.0     7  102.0
9        11.0     8  102.0
10       11.0     9  102.0
11       11.0    10  102.0
12       11.0    11  102.0
13       11.0    12  102.0
14       11.0    13  102.0
15       11.0    14  102.0
16       11.0    15  102.0
17       11.0    16  102.0
18       11.0    17  121.0
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement