Skip to content
Advertisement

Calculating the sum of the quantities of one dataframe based on dates in another dataframe (Python)

Suppose I have a first df like this:

df1:

 item     date1         date2
   1    2020-06-21    2020-06-28
   2    2020-05-13    2020-05-24
   3    2020-06-20    2020-06-28

I also have a second df (df2) with the items, a date and a quantity

df2:

 item     quantity       date
   1         5        2020-06-24
   1         8        2020-06-20
   1         12       2020-06-27
   1         9        2020-06-29
   2         10       2020-05-24
   2         11       2020-05-15
   2         18       2020-05-18
   2         9        2020-05-14
   3         7        2020-06-18
   3         12       2020-06-21
   3         13       2020-06-24
   3         8        2020-06-28

Now I want to sum the quantities from df2 where the date is between the columns date1 and date2. So my result would look like:

df3:

 item     date1         date2        sum
   1    2020-06-21    2020-06-28     17
   2    2020-05-13    2020-05-24     48
   3    2020-06-20    2020-06-28     33

I’ve been starring at it for a while now and I really want to avoid a loop.

Is there an efficient way of obtaining the desired result??

Advertisement

Answer

df = df2.merge(df1, on = 'item', how = 'left')
df[['date', 'date1', 'date2']] = df[['date', 'date1', 'date2']].apply(pd.to_datetime)
df = df[ (df['date'] >=df['date1']) & (df['date'] <=df['date2'])]
df = df.groupby(['item','date1','date2']).agg({'quantity':'sum'}).reset_index()
  

output:

   item  date1        date2    quantity
0   1   2020-06-21  2020-06-28  17
1   2   2020-05-13  2020-05-24  48
2   3   2020-06-20  2020-06-28  33
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement