I have two dataframes which I need to compare between two columns based on condition and print the output. For example:
df1:
JavaScript
x
5
1
| ID | Date | value |
2
| 248 | 2021-10-30| 4.5 |
3
| 249 | 2021-09-21| 5.0 |
4
| 100 | 2021-02-01| 3,2 |
5
df2:
JavaScript
1
10
10
1
| ID | Date | value |
2
| 245 | 2021-12-14| 4.5 |
3
| 246 | 2021-09-21| 5.0 |
4
| 247 | 2021-10-30| 3,2 |
5
| 248 | 2021-10-30| 3,1 |
6
| 249 | 2021-10-30| 2,2 |
7
| 250 | 2021-10-30| 6,3 |
8
| 251 | 2021-10-30| 9,1 |
9
| 252 | 2021-10-30| 2,0 |
10
I want to write a code which compares ID column and date column between two dataframes is having a conditions like below,
if “ID and date is matching from df1 to df2”: print(df1[‘compare’] = ‘Both matching’)
if “ID is matching and date is not matching from df1 to df2” : print(df1[‘compare’] = ‘Date not matching’)
if “ID is Not matching from df1 to df2” : print(df1[‘compare’] = ‘ID not available’)
My result df1
should look like below:
df1 (expected result):
JavaScript
1
5
1
| ID | Date | value | compare
2
| 248 | 2021-10-30| 4.5 | Both matching
3
| 249 | 2021-09-21| 5.0 | Id matching - Date not matching
4
| 100 | 2021-02-01| 3,2 | Id not available
5
how to do this with Python pandas dataframe?
Advertisement
Answer
What I suggest you do is to use iterrows
. It might not be the best idea, but still can solve your problem:
JavaScript
1
17
17
1
compareColumn = []
2
for index, row in df1.iterrows():
3
df2Row = df2[df2["ID"] == row["ID"]]
4
if df2Row.shape[0] == 0:
5
compareColumn.append("ID not available")
6
else:
7
check = False
8
for jndex, row2 in df2Row.iterrows():
9
if row2["Date"] == row["Date"]:
10
compareColumn.append("Both matching")
11
check = True
12
break
13
if check == False:
14
compareColumn.append("Date not matching")
15
df1["compare"] = compareColumn
16
df1
17
Output
ID | Date | value | compare | |
---|---|---|---|---|
0 | 248 | 2021-10-30 | 4.5 | Both matching |
1 | 249 | 2021-09-21 | 5 | Date not matching |
2 | 100 | 2021-02-01 | 3.2 | ID not available |