i want to inpute the missing data based on multivariate imputation, in the below-attached data sets, column A has some missing values, and Column A and Column B have the correlation factor of 0.70. So I want to use a regression kind of realationship so that it will build the relation between Column A and Column B and impute the missing values in Python.
N.B.: I can do it using Mean, median, and mode, but I want to use the relationship from another column to fill the missing value.
How to deal the problem. your solution, please
JavaScript
x
17
17
1
import pandas as pd
2
from sklearn.preprocessing import Imputer
3
import numpy as np
4
5
6
# assign data of lists.
7
data = {'Date': ['9/19/14', '9/20/14', '9/21/14', '9/21/14','9/19/14', '9/20/14', '9/21/14', '9/21/14','9/19/14', '9/20/14', '9/21/14', '9/21/14', '9/21/14'],
8
'A': [77.13, 39.58, 33.70, np.nan, np.nan,39.66, 64.625, 80.04, np.nan ,np.nan ,19.43, 54.375, 38.41],
9
'B': [19.5, 21.61, 22.25, 25.05, 24.20, 23.55, 5.70, 2.675, 2.05,4.06, -0.80, 0.45, -0.90],
10
'C':['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'c']}
11
12
# Create DataFrame
13
df = pd.DataFrame(data)
14
df["Date"]= pd.to_datetime(df["Date"])
15
# Print the output.
16
print(df)
17
Advertisement
Answer
Use:
JavaScript
1
9
1
dfreg = df[df['A'].notna()]
2
dfimp = df[df['A'].isna()]
3
4
from sklearn.neural_network import MLPRegressor
5
regr = MLPRegressor(random_state=1, max_iter=200).fit(dfreg['B'].values.reshape(-1, 1), dfreg['A'])
6
regr.score(dfreg['B'].values.reshape(-1, 1), dfreg['A'])
7
8
regr.predict(dfimp['B'].values.reshape(-1, 1))
9
Note that in the provided data correlation of the A and B columns are very low (less than .05). For replacing the imputed values with empty cells:
JavaScript
1
3
1
s = df[df['A'].isna()]['A'].index
2
df.loc[s, 'A'] = regr.score(dfreg['B'].values.reshape(-1, 1), dfreg['A'])
3
Output: