I am facing some difficulties using merge function in Pandas. I am looking for some kind of Vlookup formula to assist me on this. However, I couldn’t solve my problem. My data is huge and I couldn’t share here due to confidential. However, I try to came up with similar data here.
Old Code | New Code | Name | Invoice Date |
---|---|---|---|
1001011 | NA | Cheese Cake | 02/02/2021 |
1001012 | NA | Coffee | 03/05/2021 |
1001011 | NA | Cheese Cake | 30/05/2021 |
NA | 2002093 | Jasmine Tea | 21/08/2021 |
NA | 2002042 | Cookies | 31/12/2021 |
NA | 2002080 | Coffee | 09/01/2022 |
NA | 2002093 | Jasmine Tea | 05/05/2022 |
NA | 2002058 | Cheese Cake | 07/06/2022 |
I would like to have a COST Column input in my table above. However, the cost is very by invoice date (Also take note on the changing of product code). We have 2 cost table. For year 2021:
Old Code | New Code | Name | Jan-21 | Feb-21 | Mar-21 | Apr-21 | May-21 | June-21 | Jul-21 | Aug-21 | Sep-21 | Oct-21 | Nov-21 | Dec-21 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1001011 | 2002058 | Cheese Cake | 50 | 51 | 50 | 53 | 54 | 52 | 55 | 53 | 50 | 52 | 53 | 53 |
1001012 | 2002080 | Coffee | 5 | 6 | 5 | 6 | 6 | 5 | 7 | 5 | 6 | 5 | 6 | 6 |
1001015 | 2002093 | Jasmine Tea | 4 | 3 | 3 | 4 | 4 | 3 | 5 | 3 | 3 | 3 | 3 | 4 |
1001020 | 2002042 | Cookies | 20 | 20 | 21 | 20 | 22 | 20 | 21 | 20 | 22 | 20 | 21 | 22 |
And also for Year 2022:
Old Code | New Code | Name | Jan-22 | Feb-22 | Mar-22 | Apr-22 | May-22 | June-22 | Jul-22 | Aug-22 | Sep-22 | Oct-22 | Nov-22 | Dec-22 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1001011 | 2002058 | Cheese Cake | 52 | 52 | 55 | 55 | 56 | 52 | NA | NA | NA | NA | NA | NA |
1001012 | 2002080 | Coffee | 5 | 6 | 5 | 6 | 6 | 6.5 | NA | NA | NA | NA | NA | NA |
1001015 | 2002093 | Jasmine Tea | 4 | 3 | 3 | 5 | 5 | 5.5 | NA | NA | NA | NA | NA | NA |
1001020 | 2002042 | Cookies | 22 | 22 | 23 | 23 | 23.5 | 23 | NA | NA | NA | NA | NA | NA |
So basically, I would like to have my cost column in my first Data Frame to reflect the correct costing for different Year and different Month. Example: Invoice Date Costing for 03/05/2021 = May_2021
Would you mind to assist me on this?
Highly Appreciated.
Thank you very much
Advertisement
Answer
You need to have the month and code number on both sides when merging, so:
- Create a year-month column in the invoice dataframe that is consistent with the cost table
- Combine two cost tables
- Merge with new code and old code respectively
import pandas as pd
import io
import datetime
invoice_data_text = '''Old Code New Code Name Invoice Date
1001011 NA Cheese Cake 02/02/2021
1001012 NA Coffee 03/05/2021
1001011 NA Cheese Cake 30/05/2021
NA 2002093 Jasmine Tea 21/08/2021
NA 2002042 Cookies 31/12/2021
NA 2002080 Coffee 09/01/2022
NA 2002093 Jasmine Tea 05/05/2022
NA 2002058 Cheese Cake 07/06/2022
'''
cost_2021_text = '''
Old Code New Code Name Jan-21 Feb-21 Mar-21 Apr-21 May-21 June-21 Jul-21 Aug-21 Sep-21 Oct-21 Nov-21 Dec-21
1001011 2002058 Cheese Cake 50 51 50 53 54 52 55 53 50 52 53 53
1001012 2002080 Coffee 5 6 5 6 6 5 7 5 6 5 6 6
1001015 2002093 Jasmine Tea 4 3 3 4 4 3 5 3 3 3 3 4
1001020 2002042 Cookies 20 20 21 20 22 20 21 20 22 20 21 22
'''
cost_2022_text = '''
Old Code New Code Name Jan-22 Feb-22 Mar-22 Apr-22 May-22 June-22 Jul-22 Aug-22 Sep-22 Oct-22 Nov-22 Dec-22
1001011 2002058 Cheese Cake 52 52 55 55 56 52 NA NA NA NA NA NA
1001012 2002080 Coffee 5 6 5 6 6 6.5 NA NA NA NA NA NA
1001015 2002093 Jasmine Tea 4 3 3 5 5 5.5 NA NA NA NA NA NA
1001020 2002042 Cookies 22 22 23 23 23.5 23 NA NA NA NA NA NA
'''
# Prepare
invoice_df = pd.read_csv(io.StringIO(invoice_data_text),sep="t",parse_dates=["Invoice Date"])
cost21 = pd.read_csv(io.StringIO(cost_2021_text),sep='t')
cost22 = pd.read_csv(io.StringIO(cost_2022_text),sep='t')
# Create Month column for merging
invoice_df["Month"] = invoice_df["Invoice Date"].map(lambda x:datetime.datetime.strftime(x,"%b-%y"))
# Combine two cost tables
cost21_stack = cost21.set_index(list(cost21.columns[:3])).stack().reset_index(name="Cost")
cost22_stack = cost22.set_index(list(cost22.columns[:3])).stack().reset_index(name="Cost")
cost_table = pd.concat([cost21_stack,cost22_stack]).rename({"level_3":"Month"},axis=1)
# Merge with new code and old code respectively
old_code_result = pd.merge(invoice_df[pd.isna(invoice_df["Old Code"]) == False], cost_table[["Old Code","Month","Cost"]], on=["Old Code","Month"] ,how="left")
new_code_result = pd.merge(invoice_df[pd.isna(invoice_df["New Code"]) == False], cost_table[["New Code","Month","Cost"]], on=["New Code","Month"] ,how="left")
# Combine result
pd.concat([old_code_result,new_code_result])