I have two dataframes which I need to compare between two columns based on condition and print the output. For example:
df1:
| ID | Date | value | | 248 | 2021-10-30| 4.5 | | 249 | 2021-09-21| 5.0 | | 100 | 2021-02-01| 3,2 |
df2:
| ID | Date | value | | 245 | 2021-12-14| 4.5 | | 246 | 2021-09-21| 5.0 | | 247 | 2021-10-30| 3,2 | | 248 | 2021-10-30| 3,1 | | 249 | 2021-10-30| 2,2 | | 250 | 2021-10-30| 6,3 | | 251 | 2021-10-30| 9,1 | | 252 | 2021-10-30| 2,0 |
I want to write a code which compares ID column and date column between two dataframes is having a conditions like below,
if “ID and date is matching from df1 to df2”: print(df1[‘compare’] = ‘Both matching’)
if “ID is matching and date is not matching from df1 to df2” : print(df1[‘compare’] = ‘Date not matching’)
if “ID is Not matching from df1 to df2” : print(df1[‘compare’] = ‘ID not available’)
My result df1
should look like below:
df1 (expected result):
| ID | Date | value | compare | 248 | 2021-10-30| 4.5 | Both matching | 249 | 2021-09-21| 5.0 | Id matching - Date not matching | 100 | 2021-02-01| 3,2 | Id not available
how to do this with Python pandas dataframe?
Advertisement
Answer
What I suggest you do is to use iterrows
. It might not be the best idea, but still can solve your problem:
compareColumn = [] for index, row in df1.iterrows(): df2Row = df2[df2["ID"] == row["ID"]] if df2Row.shape[0] == 0: compareColumn.append("ID not available") else: check = False for jndex, row2 in df2Row.iterrows(): if row2["Date"] == row["Date"]: compareColumn.append("Both matching") check = True break if check == False: compareColumn.append("Date not matching") df1["compare"] = compareColumn df1
Output
ID | Date | value | compare | |
---|---|---|---|---|
0 | 248 | 2021-10-30 | 4.5 | Both matching |
1 | 249 | 2021-09-21 | 5 | Date not matching |
2 | 100 | 2021-02-01 | 3.2 | ID not available |