I have the following dataframe:
| col1 | col2 | col3 | | 5 | 3 | 9 | | NaN | 6 | NaN | | NaN | 3 | 7 | | 7 | 8 | 5 | | NaN | 3 | NaN | | 2 | 2 | 4 |
And I want value NaN to be filled with the conditional mean of previous and next value based on the same column.
| col1 | col2 | col3 | | 5 | 3 | 9 | | 6 | 6 | 8 | | 6 | 3 | 7 | | 7 | 8 | 5 | | 4.5 | 3 | 4.5 | | 2 | 2 | 4 |
Just like this, value 6 is the mean with 5 and 7. And this is a little part of my dataframe, so I need to replace all the NaN.
Advertisement
Answer
EDIT:
For replace missing values in all columns use:
df = df.bfill().add(df.ffill()).div(2)
If need repalce only some columns, e.g. numeric:
cols = df.select_dtypes(np.number).columns df[cols] = df[cols].bfill().add(df[cols].ffill()).div(2)
Use:
df = pd.DataFrame({'col':[1,15.6,np.nan, np.nan, 15.8,5, np.nan, 4,10, np.nan, np.nan,np.nan, 7]}) #filter non missing values m = df['col'].notna() #count 2 consecutive NaNs m = df.groupby(m.cumsum()[~m])['col'].transform('size').eq(2) #expand mask to previous and next values for consecutive 2 NaNs mask = m.shift(fill_value=False) | m.shift(-1, fill_value=False) print (mask) 0 False 1 True 2 True 3 True 4 True 5 False 6 False 7 False 8 False 9 False 10 False 11 False 12 False Name: col, dtype: bool
#for filtered rows create means df.loc[mask, 'col'] = df.loc[mask, 'col'].bfill().add(df.loc[mask, 'col'].ffill()).div(2) print (df) col 0 1.0 1 15.6 2 15.7 3 15.7 4 15.8 5 5.0 6 NaN 7 4.0 8 10.0 9 NaN 10 NaN 11 NaN 12 7.0
If need means for all missing values remove mask:
df['col'] = df['col'].bfill().add(df['col'].ffill()).div(2) print (df) col 0 1.0 1 15.6 2 15.7 3 15.7 4 15.8 5 5.0 6 4.5 7 4.0 8 10.0 9 8.5 10 8.5 11 8.5 12 7.0