I need to update vid
or maybe create a new column based on the change
column
df = [{'vid': 14, 'change': 0}, {'vid': 15, 'change': 1}, {'vid': 16, 'change': 0}, {'vid': 16, 'change': 0}, {'vid': 17, 'change': 0}, {'vid': 17, 'change': 1}, {'vid': 18, 'change': 0}, {'vid': 18, 'change': 0}]
vid change 0 14 0 1 15 1 2 16 0 3 16 0 4 17 0 5 17 1 6 18 0 7 18 0
If change == 1
then the next set of vid should be changed to the current and if change == 0
then the next vid should stay the same.
From my example above vid 16 needs to be changed to 15 since 15 has change = 1 however 17 stays since 16 doesn’t have change = 1
Change = 1 will only occur when vid is going to change in the next row
Expected output
vid change 0 14 0 1 15 1 2 15 0 3 15 0 4 17 0 5 17 1 6 17 0 7 17 0
Advertisement
Answer
If you don’t mind renumbering all the replaced numbers to be sequential, you can get a pretty clean version of it by subtracting the shift
ed cumsum
of change:
In [59]: df['new_vid'] = df['vid'] - df['change'].shift(fill_value=0).cumsum() In [60]: df Out[60]: vid change new_vid 0 14 0 14 1 15 1 15 2 16 0 15 3 16 0 15 4 17 0 16 5 17 1 16 6 18 0 16 7 18 0 16
Another option with the original numbering is to use map
and groupby
:
In [81]: df['vid'] = df['vid'] - df['vid'].map(df.groupby("vid")['change'].max().shift(fill_value=0)) In [82]: df Out[82]: vid change 0 14 0 1 15 1 2 15 0 3 15 0 4 17 0 5 17 1 6 17 0 7 17 0