Skip to content
Advertisement

Missing value Imputation based on regression in pandas

i want to inpute the missing data based on multivariate imputation, in the below-attached data sets, column A has some missing values, and Column A and Column B have the correlation factor of 0.70. So I want to use a regression kind of realationship so that it will build the relation between Column A and Column B and impute the missing values in Python.

N.B.: I can do it using Mean, median, and mode, but I want to use the relationship from another column to fill the missing value.

How to deal the problem. your solution, please

import pandas as pd
from sklearn.preprocessing import Imputer
import numpy as np
  

    # assign data of lists.  
    data = {'Date': ['9/19/14', '9/20/14', '9/21/14', '9/21/14','9/19/14', '9/20/14', '9/21/14', '9/21/14','9/19/14', '9/20/14', '9/21/14', '9/21/14', '9/21/14'], 
            'A': [77.13, 39.58, 33.70, np.nan, np.nan,39.66, 64.625, 80.04, np.nan ,np.nan ,19.43, 54.375, 38.41],
            'B': [19.5, 21.61, 22.25, 25.05, 24.20, 23.55, 5.70, 2.675, 2.05,4.06, -0.80, 0.45, -0.90],
            'C':['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'c']}  
      
    # Create DataFrame  
    df = pd.DataFrame(data)  
    df["Date"]= pd.to_datetime(df["Date"]) 
    # Print the output.  
    print(df) 

Advertisement

Answer

Use:

dfreg = df[df['A'].notna()]
dfimp = df[df['A'].isna()]

from sklearn.neural_network import MLPRegressor    
regr = MLPRegressor(random_state=1, max_iter=200).fit(dfreg['B'].values.reshape(-1, 1), dfreg['A'])
regr.score(dfreg['B'].values.reshape(-1, 1), dfreg['A'])

regr.predict(dfimp['B'].values.reshape(-1, 1))

Note that in the provided data correlation of the A and B columns are very low (less than .05). For replacing the imputed values with empty cells:

s = df[df['A'].isna()]['A'].index
df.loc[s, 'A'] = regr.score(dfreg['B'].values.reshape(-1, 1), dfreg['A'])

Output:

enter image description here

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement