I have a data frame in pandas like this:
Name Date A 9/1/21 B 10/20/21 C 9/8/21 D 9/20/21 K 9/29/21 K 9/15/21 M 10/1/21 C 9/12/21 D 9/9/21 C 9/9/21 R 9/20/21
I need to get the count of items by week.
weeks = [9/6/21, 9/13, 9/20/21, 9/27/21, 10/4/21]
Example: From 9/6 to 9/13, the output should be:
Name Weekly count A 0 B 0 C 3 D 1 M 0 K 0 R 0
Similarly, I need to find the count on these intervals: 9/13 to 9/20, 9/20 to 9/27, and 9/27 to 10/4. Thank you!
Advertisement
Answer
May be with the caveat of the definition of the first day of a week, you could take something in the following code.
df = pd.DataFrame(data=d) df['Date']=pd.to_datetime(df['Date'])
I. Discontinuous index
Monday is chosen as the first day of week
#(1) Build a series of first_day_of_week, monday is chosen as the first day of week
weeks_index = df['Date'] - df['Date'].dt.weekday * np.timedelta64(1, 'D')
#(2) Groupby and some tidying
df2 = ( df.groupby([df['Name'], weeks_index])
.count()
.rename(columns={'Date':'Count'})
.swaplevel() # weeks to first level
.sort_index()
.unstack(1).fillna(0.0)
.astype(int)
.rename_axis('first_day_of_week')
)
>>> print(df2)
Name A B C D K M R
first_day_of_week
2021-08-30 1 0 0 0 0 0 0
2021-09-06 0 0 3 1 0 0 0
2021-09-13 0 0 0 0 1 0 0
2021-09-20 0 0 0 1 0 0 1
2021-09-27 0 0 0 0 1 1 0
2021-10-18 0 1 0 0 0 0 0
II. Continuous index
This part does not differ much of the previous one.
We build a continuous version of the index to be use to reindex
Monday is chosen as the first day of week (obviouly for the two indices)
#(1a) Build a series of first_day_of_week, monday is chosen as the
weeks_index = df['Date'] - df['Date'].dt.weekday * np.timedelta64(1, 'D')
#(1b) Build a continuous series of first_day_of_week
continuous_weeks_index = pd.date_range(start=weeks_index.min(),
end=weeks_index.max(),
freq='W-MON') # monday
#(2) Groupby, unstack, reindex, and some tidying
df2 = ( df
# groupby and count
.groupby([df['Name'], weeks_index])
.count()
.rename(columns={'Date':'Count'})
# unstack on weeks
.swaplevel() # weeks to first level
.sort_index()
.unstack(1)
# reindex to insert weeks with no data
.reindex(continuous_weeks_index) # new index
# clean up
.fillna(0.0)
.astype(int)
.rename_axis('first_day_of_week')
)
>>>print(df2)
Name A B C D K M R
first_day_of_week
2021-08-30 1 0 0 0 0 0 0
2021-09-06 0 0 3 1 0 0 0
2021-09-13 0 0 0 0 1 0 0
2021-09-20 0 0 0 1 0 0 1
2021-09-27 0 0 0 0 1 1 0
2021-10-04 0 0 0 0 0 0 0
2021-10-11 0 0 0 0 0 0 0
2021-10-18 0 1 0 0 0 0 0
Last step if needed
df2.stack()