I have a dataframe [pixel, total_time], i want to:
- Make a new column “total_time_one”, which takes total_time of pixel 1 and projects it
pixel total_time total_time_one 1 218.835 218.835 #projected times of pixel 1 onto all valyues 1 218.835 218.835 1 218.835 218.835 2 219.878 218.835 2 219.878 218.835 2 219.878 218.835 3 220.911 218.835 3 220.911 218.835 3 220.911 218.835 1 230.189 230.189 #value changes cause pixel 1 shows up again 1 230.189 230.189 1 230.189 230.189 2 231.441 230.189 2 231.441 230.189 2 231.441 230.189
I have acheved the above dataframe with :
uniqueone = df.query("pixel==1").total_time.unique() mask = df["total_time"].isin(uniqueone) df["total_time_one"] = (df[mask]["total_time"])#putting it here isn't working: .fillna(method='ffill') df["total_time_one"] = df["total_time_one"].fillna(method='ffill')
Howver the code is quite long and repeats itself, is there a function better suited? or a better solution?
Also i do not undestand why if i put:
df["total_time_one"] = (df[mask]["total_time"].fillna(method='ffill')
It doens’t work, and i have to put an extra line:
df["total_time_one"] = df["total_time_one"].fillna(method='ffill')
to make it work
Advertisement
Answer
Use where
to NaN
the values that aren’t 'pixel'==1
and then ffill
. This technically forward fills the last value of each group, but your values are static within each group of pixels.
df['total_time_one'] = df['total_time'].where(df.pixel.eq(1)).ffill()
pixel total_time total_time_one 0 1 218.835 218.835 1 1 218.835 218.835 2 1 218.835 218.835 3 2 219.878 218.835 4 2 219.878 218.835 5 2 219.878 218.835 6 3 220.911 218.835 7 3 220.911 218.835 8 3 220.911 218.835 9 1 230.189 230.189 10 1 230.189 230.189 11 1 230.189 230.189 12 2 231.441 230.189 13 2 231.441 230.189 14 2 231.441 230.189
If you wanted to use the first
value within each group (as opposed to the last), or say take the average and then ffill
you can use groupby
+ transform
. You label successive groups of 1
s using the cumsum of a !=
comparison.
df['total_time_one'] = (df['total_time'].where(df.pixel.eq(1)) .groupby(df.pixel.ne(1).cumsum()) .transform('first') .ffill())