Skip to content
Advertisement

Python – Anyone mind to assist in this Pandas Dataframe problem? URGENT

I am facing some difficulties using merge function in Pandas. I am looking for some kind of Vlookup formula to assist me on this. However, I couldn’t solve my problem. My data is huge and I couldn’t share here due to confidential. However, I try to came up with similar data here.

Old Code New Code Name Invoice Date
1001011 NA Cheese Cake 02/02/2021
1001012 NA Coffee 03/05/2021
1001011 NA Cheese Cake 30/05/2021
NA 2002093 Jasmine Tea 21/08/2021
NA 2002042 Cookies 31/12/2021
NA 2002080 Coffee 09/01/2022
NA 2002093 Jasmine Tea 05/05/2022
NA 2002058 Cheese Cake 07/06/2022

I would like to have a COST Column input in my table above. However, the cost is very by invoice date (Also take note on the changing of product code). We have 2 cost table. For year 2021:

Old Code New Code Name Jan-21 Feb-21 Mar-21 Apr-21 May-21 June-21 Jul-21 Aug-21 Sep-21 Oct-21 Nov-21 Dec-21
1001011 2002058 Cheese Cake 50 51 50 53 54 52 55 53 50 52 53 53
1001012 2002080 Coffee 5 6 5 6 6 5 7 5 6 5 6 6
1001015 2002093 Jasmine Tea 4 3 3 4 4 3 5 3 3 3 3 4
1001020 2002042 Cookies 20 20 21 20 22 20 21 20 22 20 21 22

And also for Year 2022:

Old Code New Code Name Jan-22 Feb-22 Mar-22 Apr-22 May-22 June-22 Jul-22 Aug-22 Sep-22 Oct-22 Nov-22 Dec-22
1001011 2002058 Cheese Cake 52 52 55 55 56 52 NA NA NA NA NA NA
1001012 2002080 Coffee 5 6 5 6 6 6.5 NA NA NA NA NA NA
1001015 2002093 Jasmine Tea 4 3 3 5 5 5.5 NA NA NA NA NA NA
1001020 2002042 Cookies 22 22 23 23 23.5 23 NA NA NA NA NA NA

So basically, I would like to have my cost column in my first Data Frame to reflect the correct costing for different Year and different Month. Example: Invoice Date Costing for 03/05/2021 = May_2021

Would you mind to assist me on this?

Highly Appreciated.

Thank you very much

Advertisement

Answer

You need to have the month and code number on both sides when merging, so:

  1. Create a year-month column in the invoice dataframe that is consistent with the cost table
  2. Combine two cost tables
  3. Merge with new code and old code respectively

import pandas as pd
import io
import datetime

invoice_data_text = '''Old Code New Code    Name    Invoice Date
1001011 NA  Cheese Cake 02/02/2021
1001012 NA  Coffee  03/05/2021
1001011 NA  Cheese Cake 30/05/2021
NA  2002093 Jasmine Tea 21/08/2021
NA  2002042 Cookies 31/12/2021
NA  2002080 Coffee  09/01/2022
NA  2002093 Jasmine Tea 05/05/2022
NA  2002058 Cheese Cake 07/06/2022
'''

cost_2021_text = '''
Old Code    New Code    Name    Jan-21  Feb-21  Mar-21  Apr-21  May-21  June-21 Jul-21  Aug-21  Sep-21  Oct-21  Nov-21  Dec-21
1001011 2002058 Cheese Cake 50  51  50  53  54  52  55  53  50  52  53  53
1001012 2002080 Coffee  5   6   5   6   6   5   7   5   6   5   6   6
1001015 2002093 Jasmine Tea 4   3   3   4   4   3   5   3   3   3   3   4
1001020 2002042 Cookies 20  20  21  20  22  20  21  20  22  20  21  22
'''

cost_2022_text = '''
Old Code    New Code    Name    Jan-22  Feb-22  Mar-22  Apr-22  May-22  June-22 Jul-22  Aug-22  Sep-22  Oct-22  Nov-22  Dec-22
1001011 2002058 Cheese Cake 52  52  55  55  56  52  NA  NA  NA  NA  NA  NA
1001012 2002080 Coffee  5   6   5   6   6   6.5 NA  NA  NA  NA  NA  NA
1001015 2002093 Jasmine Tea 4   3   3   5   5   5.5 NA  NA  NA  NA  NA  NA
1001020 2002042 Cookies 22  22  23  23  23.5    23  NA  NA  NA  NA  NA  NA
'''

# Prepare
invoice_df = pd.read_csv(io.StringIO(invoice_data_text),sep="t",parse_dates=["Invoice Date"])
cost21 = pd.read_csv(io.StringIO(cost_2021_text),sep='t')
cost22 =  pd.read_csv(io.StringIO(cost_2022_text),sep='t')

# Create Month column for merging
invoice_df["Month"] = invoice_df["Invoice Date"].map(lambda x:datetime.datetime.strftime(x,"%b-%y"))

# Combine two cost tables
cost21_stack = cost21.set_index(list(cost21.columns[:3])).stack().reset_index(name="Cost")
cost22_stack = cost22.set_index(list(cost22.columns[:3])).stack().reset_index(name="Cost")
cost_table = pd.concat([cost21_stack,cost22_stack]).rename({"level_3":"Month"},axis=1)

# Merge with new code and old code respectively
old_code_result = pd.merge(invoice_df[pd.isna(invoice_df["Old Code"]) == False], cost_table[["Old Code","Month","Cost"]], on=["Old Code","Month"] ,how="left")
new_code_result = pd.merge(invoice_df[pd.isna(invoice_df["New Code"]) == False], cost_table[["New Code","Month","Cost"]], on=["New Code","Month"] ,how="left")

# Combine result
pd.concat([old_code_result,new_code_result])
    
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement