I have the following dataframe:
JavaScript
x
8
1
| col1 | col2 | col3 |
2
| 5 | 3 | 9 |
3
| NaN | 6 | NaN |
4
| NaN | 3 | 7 |
5
| 7 | 8 | 5 |
6
| NaN | 3 | NaN |
7
| 2 | 2 | 4 |
8
And I want value NaN to be filled with the conditional mean of previous and next value based on the same column.
JavaScript
1
8
1
| col1 | col2 | col3 |
2
| 5 | 3 | 9 |
3
| 6 | 6 | 8 |
4
| 6 | 3 | 7 |
5
| 7 | 8 | 5 |
6
| 4.5 | 3 | 4.5 |
7
| 2 | 2 | 4 |
8
Just like this, value 6 is the mean with 5 and 7. And this is a little part of my dataframe, so I need to replace all the NaN.
Advertisement
Answer
EDIT:
For replace missing values in all columns use:
JavaScript
1
2
1
df = df.bfill().add(df.ffill()).div(2)
2
If need repalce only some columns, e.g. numeric:
JavaScript
1
4
1
cols = df.select_dtypes(np.number).columns
2
3
df[cols] = df[cols].bfill().add(df[cols].ffill()).div(2)
4
Use:
JavaScript
1
27
27
1
df = pd.DataFrame({'col':[1,15.6,np.nan, np.nan, 15.8,5,
2
np.nan, 4,10, np.nan, np.nan,np.nan, 7]})
3
4
#filter non missing values
5
m = df['col'].notna()
6
7
#count 2 consecutive NaNs
8
m = df.groupby(m.cumsum()[~m])['col'].transform('size').eq(2)
9
10
#expand mask to previous and next values for consecutive 2 NaNs
11
mask = m.shift(fill_value=False) | m.shift(-1, fill_value=False)
12
print (mask)
13
0 False
14
1 True
15
2 True
16
3 True
17
4 True
18
5 False
19
6 False
20
7 False
21
8 False
22
9 False
23
10 False
24
11 False
25
12 False
26
Name: col, dtype: bool
27
JavaScript
1
18
18
1
#for filtered rows create means
2
df.loc[mask, 'col'] = df.loc[mask, 'col'].bfill().add(df.loc[mask, 'col'].ffill()).div(2)
3
print (df)
4
col
5
0 1.0
6
1 15.6
7
2 15.7
8
3 15.7
9
4 15.8
10
5 5.0
11
6 NaN
12
7 4.0
13
8 10.0
14
9 NaN
15
10 NaN
16
11 NaN
17
12 7.0
18
If need means for all missing values remove mask:
JavaScript
1
17
17
1
df['col'] = df['col'].bfill().add(df['col'].ffill()).div(2)
2
print (df)
3
col
4
0 1.0
5
1 15.6
6
2 15.7
7
3 15.7
8
4 15.8
9
5 5.0
10
6 4.5
11
7 4.0
12
8 10.0
13
9 8.5
14
10 8.5
15
11 8.5
16
12 7.0
17