I am facing some difficulties using merge function in Pandas. I am looking for some kind of Vlookup formula to assist me on this. However, I couldn’t solve my problem. My data is huge and I couldn’t share here due to confidential. However, I try to came up with similar data here.
Old Code | New Code | Name | Invoice Date |
---|---|---|---|
1001011 | NA | Cheese Cake | 02/02/2021 |
1001012 | NA | Coffee | 03/05/2021 |
1001011 | NA | Cheese Cake | 30/05/2021 |
NA | 2002093 | Jasmine Tea | 21/08/2021 |
NA | 2002042 | Cookies | 31/12/2021 |
NA | 2002080 | Coffee | 09/01/2022 |
NA | 2002093 | Jasmine Tea | 05/05/2022 |
NA | 2002058 | Cheese Cake | 07/06/2022 |
I would like to have a COST Column input in my table above. However, the cost is very by invoice date (Also take note on the changing of product code). We have 2 cost table. For year 2021:
Old Code | New Code | Name | Jan-21 | Feb-21 | Mar-21 | Apr-21 | May-21 | June-21 | Jul-21 | Aug-21 | Sep-21 | Oct-21 | Nov-21 | Dec-21 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1001011 | 2002058 | Cheese Cake | 50 | 51 | 50 | 53 | 54 | 52 | 55 | 53 | 50 | 52 | 53 | 53 |
1001012 | 2002080 | Coffee | 5 | 6 | 5 | 6 | 6 | 5 | 7 | 5 | 6 | 5 | 6 | 6 |
1001015 | 2002093 | Jasmine Tea | 4 | 3 | 3 | 4 | 4 | 3 | 5 | 3 | 3 | 3 | 3 | 4 |
1001020 | 2002042 | Cookies | 20 | 20 | 21 | 20 | 22 | 20 | 21 | 20 | 22 | 20 | 21 | 22 |
And also for Year 2022:
Old Code | New Code | Name | Jan-22 | Feb-22 | Mar-22 | Apr-22 | May-22 | June-22 | Jul-22 | Aug-22 | Sep-22 | Oct-22 | Nov-22 | Dec-22 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1001011 | 2002058 | Cheese Cake | 52 | 52 | 55 | 55 | 56 | 52 | NA | NA | NA | NA | NA | NA |
1001012 | 2002080 | Coffee | 5 | 6 | 5 | 6 | 6 | 6.5 | NA | NA | NA | NA | NA | NA |
1001015 | 2002093 | Jasmine Tea | 4 | 3 | 3 | 5 | 5 | 5.5 | NA | NA | NA | NA | NA | NA |
1001020 | 2002042 | Cookies | 22 | 22 | 23 | 23 | 23.5 | 23 | NA | NA | NA | NA | NA | NA |
So basically, I would like to have my cost column in my first Data Frame to reflect the correct costing for different Year and different Month. Example: Invoice Date Costing for 03/05/2021 = May_2021
Would you mind to assist me on this?
Highly Appreciated.
Thank you very much
Advertisement
Answer
You need to have the month and code number on both sides when merging, so:
- Create a year-month column in the invoice dataframe that is consistent with the cost table
- Combine two cost tables
- Merge with new code and old code respectively
import pandas as pd import io import datetime invoice_data_text = '''Old Code New Code Name Invoice Date 1001011 NA Cheese Cake 02/02/2021 1001012 NA Coffee 03/05/2021 1001011 NA Cheese Cake 30/05/2021 NA 2002093 Jasmine Tea 21/08/2021 NA 2002042 Cookies 31/12/2021 NA 2002080 Coffee 09/01/2022 NA 2002093 Jasmine Tea 05/05/2022 NA 2002058 Cheese Cake 07/06/2022 ''' cost_2021_text = ''' Old Code New Code Name Jan-21 Feb-21 Mar-21 Apr-21 May-21 June-21 Jul-21 Aug-21 Sep-21 Oct-21 Nov-21 Dec-21 1001011 2002058 Cheese Cake 50 51 50 53 54 52 55 53 50 52 53 53 1001012 2002080 Coffee 5 6 5 6 6 5 7 5 6 5 6 6 1001015 2002093 Jasmine Tea 4 3 3 4 4 3 5 3 3 3 3 4 1001020 2002042 Cookies 20 20 21 20 22 20 21 20 22 20 21 22 ''' cost_2022_text = ''' Old Code New Code Name Jan-22 Feb-22 Mar-22 Apr-22 May-22 June-22 Jul-22 Aug-22 Sep-22 Oct-22 Nov-22 Dec-22 1001011 2002058 Cheese Cake 52 52 55 55 56 52 NA NA NA NA NA NA 1001012 2002080 Coffee 5 6 5 6 6 6.5 NA NA NA NA NA NA 1001015 2002093 Jasmine Tea 4 3 3 5 5 5.5 NA NA NA NA NA NA 1001020 2002042 Cookies 22 22 23 23 23.5 23 NA NA NA NA NA NA ''' # Prepare invoice_df = pd.read_csv(io.StringIO(invoice_data_text),sep="t",parse_dates=["Invoice Date"]) cost21 = pd.read_csv(io.StringIO(cost_2021_text),sep='t') cost22 = pd.read_csv(io.StringIO(cost_2022_text),sep='t') # Create Month column for merging invoice_df["Month"] = invoice_df["Invoice Date"].map(lambda x:datetime.datetime.strftime(x,"%b-%y")) # Combine two cost tables cost21_stack = cost21.set_index(list(cost21.columns[:3])).stack().reset_index(name="Cost") cost22_stack = cost22.set_index(list(cost22.columns[:3])).stack().reset_index(name="Cost") cost_table = pd.concat([cost21_stack,cost22_stack]).rename({"level_3":"Month"},axis=1) # Merge with new code and old code respectively old_code_result = pd.merge(invoice_df[pd.isna(invoice_df["Old Code"]) == False], cost_table[["Old Code","Month","Cost"]], on=["Old Code","Month"] ,how="left") new_code_result = pd.merge(invoice_df[pd.isna(invoice_df["New Code"]) == False], cost_table[["New Code","Month","Cost"]], on=["New Code","Month"] ,how="left") # Combine result pd.concat([old_code_result,new_code_result])