I have the following data frames: df3
Harvest_date | Starting_date |
---|---|
2022-10-06 | 2022-08-06 |
2022-02-22 | 2021-12-22 |
df (I have all temp and humid starting from 2021-01-01 till the present)
date | temp | humid |
---|---|---|
2022-10-06 00:30:00 | 2 | 30 |
2022-10-06 00:01:00 | 1 | 30 |
2022-10-06 00:01:30 | 0 | 30 |
2022-10-06 00:02:00 | 0 | 30 |
2022-10-06 00:02:30 | -2 | 30 |
I would like to calculate the avg temperature and humidity between the starting_date and harvest_date. I tried this:
JavaScript
x
13
13
1
import pandas as pd
2
df = pd.read_csv (r'C:climate.csv')
3
df3 = pd.read_csv (r'C:Flower_weight_Seson.csv')
4
df['date'] = pd.to_datetime(df.date)
5
df3['Harvest_date'] = pd.to_datetime(df3.Harvest_date)
6
df3['Starting_date'] = pd.to_datetime(df3.Starting_date)
7
df.style.format({"date": lambda t: t.strftime("%Y-%m-%d")})
8
df3.style.format({"Harvest_date": lambda t: t.strftime("%Y-%m-%d")})
9
df3.style.format({"Starting_date": lambda t: t.strftime("%Y-%m-%d")})
10
for harvest_date,starting_date in zip(df3['Harvest_date'],df3['Starting_date']):
11
df3["Season avg temp"]= df[df["date"].between(starting_date,harvest_date)]["temp"].mean()
12
df3["Season avg humid"]= df[df["date"].between(starting_date,harvest_date)]["humid"].mean()
13
I get the same value for all dates. Can someone point out what I did wrong, please?
Advertisement
Answer
Use DataFrame.loc
with match indices by means of another DataFrame
:
JavaScript
1
19
19
1
#changed data for match with df3
2
print (df)
3
date temp humid
4
0 2022-10-06 00:30:00 2 30
5
1 2022-09-06 00:01:00 1 33
6
2 2022-09-06 00:01:30 0 23
7
3 2022-10-06 00:02:00 0 30
8
4 2022-01-06 00:02:30 -2 25
9
10
11
for i,harvest_date,starting_date in zip(df3.index,df3['Harvest_date'],df3['Starting_date']):
12
mask = df["date"].between(starting_date,harvest_date)
13
avg = df.loc[mask, ["temp",'humid']].mean()
14
df3.loc[i, ["Season avg temp",'Season avg humid']] = avg.to_numpy()
15
print (df3)
16
Harvest_date Starting_date Season avg temp Season avg humid
17
0 2022-10-06 2022-08-06 0.5 28.0
18
1 2022-02-22 2021-12-220 -2.0 25.0
19
EDIT: For add new condition for match by room
columns use:
JavaScript
1
8
1
for i,harvest_date,starting_date, room in zip(df3.index,
2
df3['Harvest_date'],
3
df3['Starting_date'], df3['Room']):
4
mask = df["date"].between(starting_date,harvest_date) & df['Room'].eq(room)
5
avg = df.loc[mask, ["temp",'humid']].mean()
6
df3.loc[i, ["Season avg temp",'Season avg humid']] = avg.to_numpy()
7
print (df3)
8