How to output a new dataframe with mismatched columns from two dataframes

Question

I want to have a function that creates a new dataframe from two dataframes. I want to show the mismatched columns based on id number and a given column. dataframes as input: expected output: Answer STEP 1 // add the table name Prefix on column name STEP 2 // Concat both df STEP 3 // Using lambda function findout which

Accepted Answer

STEP 1 // add the table name Prefix on column namedf1.columns = df1.columns + '_df1'df2.columns = df2.columns + '_df2'STEP 2 // Concat both dfdata = pd.concat([df1.set_index('first_column_df1'),df2.set_index('first_column_df2')],axis=1, join='outer').reset_index()STEP 3 // Using lambda function findout which row second column does math if does match return True and print only DF rows where condition came Truedata = data[data.apply(lambda x: x.second_column_df1 != x.second_column_df2 ,axis=1)]STEP 4 // To achieve desire outputdata[['index', 'second_column_df1', 'second_column_df2']].reset_index(drop=True)Output:    index   second_column_df1   second_column_df20   id1     1                   31   id2     2                   42   id4     NaN                 2

Advertisement

Answer