I have a data frame like as shown below
df = pd.DataFrame({'person_id': [11,11,11,21,21,21,31,31,31,31,31], 'time' :[-1,5,17,11,25,39,46,4,100,150,1], 'value':[101,102,121,120,143,153,160,170,96,97,99]})
What I would like to do is
a) FIll in the missing time
by generating a sequence number (ex:1,2,3,4) and copy the value (for all other columns) from the previous row
I was trying something like below
df.groupby(['person_id']).cumcount() + 1 df['sequence'] = g.cumcount() + 1
But this doesn’t help me get the expected output
I expect my output to be like as shown below (SAMPLE OF 1 SUBJECT IS SHOWN BELOW)
Advertisement
Answer
Let’s set the time
column as the index of dataframe then groupby
the dataframe on person_id
then for each group classified by person_id
reindex
the group to conform its index
with the range of values specified in time
column, finally concat
all the groups to get the desired dataframe:
grp = df.set_index('time').groupby('person_id') groups = [g.reindex(range(g.index.min(), g.index.max() + 1)).ffill().reset_index() for _, g in grp] out = pd.concat(groups, ignore_index=True).reindex(df.columns, axis=1)
Alternatively you can first create tuple pairs for each person_id
and corresponding range of values specified in time
column then reindex
the dataframe:
grp = df.groupby('person_id')['time'] idx = [(k, n) for k, t in grp for n in range(t.min(), t.max() + 1)] out = df.set_index(['person_id', 'time']).reindex(idx).ffill().reset_index()
Result (for person_id 11
):
person_id time value 0 11.0 -1 101.0 1 11.0 0 101.0 2 11.0 1 101.0 3 11.0 2 101.0 4 11.0 3 101.0 5 11.0 4 101.0 6 11.0 5 102.0 7 11.0 6 102.0 8 11.0 7 102.0 9 11.0 8 102.0 10 11.0 9 102.0 11 11.0 10 102.0 12 11.0 11 102.0 13 11.0 12 102.0 14 11.0 13 102.0 15 11.0 14 102.0 16 11.0 15 102.0 17 11.0 16 102.0 18 11.0 17 121.0