I want to have a function that creates a new dataframe from two dataframes.
I want to show the mismatched columns based on id number and a given column.
dataframes as input:
JavaScript
x
25
25
1
import pandas as pd
2
3
data1 = {
4
'first_column': ['id1', 'id2', 'id3'],
5
'second_column': ['1', '2', '2'],
6
'third_column': ['1', '2', '2'],
7
'fourth_column': ['1', '2', '1']
8
}
9
10
df1 = pd.DataFrame(data1)
11
print("n")
12
print("df1")
13
print(df1)
14
15
16
17
data2 = {
18
'first_column': ['id1', 'id2', 'id3', 'id4'],
19
'second_column': ['3', '4', '2', '2'],
20
'third_column': ['1', '2', '2', '2'],
21
'fourth_column': ['1', '2', '2', '2']
22
}
23
24
df2 = pd.DataFrame(data2)
25
expected output:
Advertisement
Answer
STEP 1 // add the table name Prefix on column name
JavaScript
1
3
1
df1.columns = df1.columns + '_df1'
2
df2.columns = df2.columns + '_df2'
3
STEP 2 // Concat both df
JavaScript
1
2
1
data = pd.concat([df1.set_index('first_column_df1'),df2.set_index('first_column_df2')],axis=1, join='outer').reset_index()
2
STEP 3 // Using lambda function findout which row second column does math if does match return True and print only DF rows where condition came True
JavaScript
1
2
1
data = data[data.apply(lambda x: x.second_column_df1 != x.second_column_df2 ,axis=1)]
2
STEP 4 // To achieve desire output
JavaScript
1
2
1
data[['index', 'second_column_df1', 'second_column_df2']].reset_index(drop=True)
2
Output:
JavaScript
1
5
1
index second_column_df1 second_column_df2
2
0 id1 1 3
3
1 id2 2 4
4
2 id4 NaN 2
5