I currently have two dataFrames that look like this:
Df3 - which is the output dataFrame: | CompanyNm | CpID | Date | :--------|:-----------------------------------|:-------------|:----------| 0 | {Converting) | {C} | 2020-02-14| 1 | {Converting) | {C} | 2020-02-17| 2 | {Converting) | {C} | 2020-02-18| 3 | {Converting) | {C} | 2020-02-19| 4 | {Converting) | {C} | 2020-02-20| ... | ... | ... | ... | 521473 |ZHENGZHOU WARMSHINE PAPER CO., LTD. | 2316 | 2022-02-08| 521474 |ZHENGZHOU WARMSHINE PAPER CO., LTD. | 2316 | 2022-02-09| 521475 |ZHENGZHOU WARMSHINE PAPER CO., LTD. | 2316 | 2022-02-10| 521476 |ZHENGZHOU WARMSHINE PAPER CO., LTD. | 2316 | 2022-02-11| 521477 |ZHENGZHOU WARMSHINE PAPER CO., LTD. | 2316 | 2022-02-14|
Df4:
| BuySellNo | CustID | InvoiceDt | DueDt | SalesAmt | :-----|:------------|:-------------|:----------|:----------|------------| 0 | 100 | 1047 | 2017-07-10| 2017-11-04| 46137.30 | 1 | 1000 | 1145 | 2017-10-12| 2018-04-03| 362133.15 | 2 | 10000 | 1047 | 2019-12-27| 2020-02-25| 28445.90 | 3 | 10001 | 2007 | 2019-12-27| 2020-02-25| 28445.90 | 4 | 10002 | 2304 | 2019-12-26| 2020-01-03| 27558.62 | ... | ...| ...| ...| ...| ... | 21362 | ZD229 | 1064 | 2017-07-24| 2017-10-11| 0.00 | 21363 | ZD230 | 1064 | 2017-06-14| 2017-10-12| 1400.39 | 21364 | ZD231 | 1064 | 2017-06-13| 2017-10-11| 1967.19 | 21365 | ZD232 | 1063 | 2017-06-28| 2017-06-28| 128.26 | 21366 | ZD233 | 1064 | 2017-06-14| 2017-10-12| 3146.06 |
I am trying to add a new column to Df3, which is the sum of all Sales (SalesAmt) where the invoicing date (InvoiceDt) is prior to the date column in Df3.
df3['Total sales'] = df4[['SalesAmt']].sum(axis=1).where(df3['Date'] <= df4['InvoiceDt'], 0)
I get the following error in this case:
ValueError: Can only compare identically-labeled Series objects
Any idea how to fix this? Or a more efficient way to obtain the wanted result?
Thanks a lot,
Advertisement
Answer
EDIT: I misunderstood your problem definition earlier. Now corrected it:
def func(g): mask = (df4['CustID'] == g.name[0]) & (df4['InvoiceDt'] <= g.name[1]) return df4[mask]['SalesAmt'].sum() df3.groupby(['CpID','Date']).apply(func)