Suppose I have a first df like this:
df1:
JavaScript
x
5
1
item date1 date2
2
1 2020-06-21 2020-06-28
3
2 2020-05-13 2020-05-24
4
3 2020-06-20 2020-06-28
5
I also have a second df (df2) with the items, a date and a quantity
df2:
JavaScript
1
14
14
1
item quantity date
2
1 5 2020-06-24
3
1 8 2020-06-20
4
1 12 2020-06-27
5
1 9 2020-06-29
6
2 10 2020-05-24
7
2 11 2020-05-15
8
2 18 2020-05-18
9
2 9 2020-05-14
10
3 7 2020-06-18
11
3 12 2020-06-21
12
3 13 2020-06-24
13
3 8 2020-06-28
14
Now I want to sum the quantities from df2 where the date is between the columns date1 and date2. So my result would look like:
df3:
JavaScript
1
5
1
item date1 date2 sum
2
1 2020-06-21 2020-06-28 17
3
2 2020-05-13 2020-05-24 48
4
3 2020-06-20 2020-06-28 33
5
I’ve been starring at it for a while now and I really want to avoid a loop.
Is there an efficient way of obtaining the desired result??
Advertisement
Answer
JavaScript
1
6
1
df = df2.merge(df1, on = 'item', how = 'left')
2
df[['date', 'date1', 'date2']] = df[['date', 'date1', 'date2']].apply(pd.to_datetime)
3
df = df[ (df['date'] >=df['date1']) & (df['date'] <=df['date2'])]
4
df = df.groupby(['item','date1','date2']).agg({'quantity':'sum'}).reset_index()
5
6
output:
JavaScript
1
5
1
item date1 date2 quantity
2
0 1 2020-06-21 2020-06-28 17
3
1 2 2020-05-13 2020-05-24 48
4
2 3 2020-06-20 2020-06-28 33
5