I have a daily dataset for different months. I want to calculate the week start(sunday) and week end(saturday) based on each product type & country and values should be the average for that particular week.
dates product country value name 2021-10-01 00:00:00 Voice Lucia 2 A 2021-10-01 00:00:00 TV Jamai 1 A 2021-10-01 00:00:00 TV Trin 5 A 2021-10-01 00:00:00 Voice Gren 5 A 2021-10-01 00:00:00 Broad Vin 7 A 2021-10-01 00:00:00 TV Gren 8 A 2021-10-01 00:00:00 Broad Barb 5 A 2021-10-01 00:00:00 Voice Jamai 23 A 2021-10-01 00:00:00 Voice Trin 6 A 2021-10-01 00:00:00 TV Cur 7 A 2021-10-02 00:00:00 Broad Jamai 2 A 2021-10-03 00:00:00 Broad Trin 8 A 2021-10-04 00:00:00 Broad Lucia 3 A 2021-10-04 00:00:00 TV Anti 1 A 2021-10-04 00:00:00 Broad Cur 8 A 2021-10-04 00:00:00 Voice Barb 0 A 2021-10-04 00:00:00 TV Vin 5 A 2021-10-04 00:00:00 Voice Vin 1 A 2021-10-05 00:00:00 NAN NAN NAN NAN 2021-10-06 00:00:00 NAN NAN NAN NAN 2021-10-07 00:00:00 NAN NAN NAN NAN 2021-10-08 00:00:00 NAN NAN NAN NAN 2021-10-09 00:00:00 NAN NAN NAN NAN 2021-10-10 00:00:00 NAN NAN NAN NAN 2021-10-11 00:00:00 NAN NAN NAN NAN 2021-10-12 00:00:00 NAN NAN NAN NAN 2021-10-13 00:00:00 NAN NAN NAN NAN 2021-10-14 00:00:00 NAN NAN NAN NAN 2021-10-15 00:00:00 NAN NAN NAN NAN ............... ..............................etc
SAMPLE result format:
week_start week_end product country name value(**average of values for each week**)
I tried with groupby but I’m not able to get week start and end for each product and country.
Also Values should be the average(addition) for that particular week. Any help on how to achieve this?
Advertisement
Answer
The first step would be to create a new column for the start date and end date for the date of each row. This can be done by using offsets.Week:
import pandas as pd df['start'] = df['dates'] - pd.offsets.Week(weekday=6) df['end'] = df['start'] + pd.offsets.Week(weekday=5)
From there you can use groupby to group by the start, end, product and country columns and use the mean aggregation method for the value column:
df.groupby(['start','end','product','country']).agg({'value': 'mean'}).reset_index()