I have a data frame in pandas like this:
Name Date A 9/1/21 B 10/20/21 C 9/8/21 D 9/20/21 K 9/29/21 K 9/15/21 M 10/1/21 C 9/12/21 D 9/9/21 C 9/9/21 R 9/20/21
I need to get the count of items by week.
weeks = [9/6/21, 9/13, 9/20/21, 9/27/21, 10/4/21]
Example: From 9/6 to 9/13, the output should be:
Name Weekly count A 0 B 0 C 3 D 1 M 0 K 0 R 0
Similarly, I need to find the count on these intervals: 9/13 to 9/20, 9/20 to 9/27, and 9/27 to 10/4. Thank you!
Advertisement
Answer
May be with the caveat of the definition of the first day of a week, you could take something in the following code.
df = pd.DataFrame(data=d) df['Date']=pd.to_datetime(df['Date'])
I. Discontinuous index
Monday is chosen as the first day of week
#(1) Build a series of first_day_of_week, monday is chosen as the first day of week weeks_index = df['Date'] - df['Date'].dt.weekday * np.timedelta64(1, 'D') #(2) Groupby and some tidying df2 = ( df.groupby([df['Name'], weeks_index]) .count() .rename(columns={'Date':'Count'}) .swaplevel() # weeks to first level .sort_index() .unstack(1).fillna(0.0) .astype(int) .rename_axis('first_day_of_week') ) >>> print(df2) Name A B C D K M R first_day_of_week 2021-08-30 1 0 0 0 0 0 0 2021-09-06 0 0 3 1 0 0 0 2021-09-13 0 0 0 0 1 0 0 2021-09-20 0 0 0 1 0 0 1 2021-09-27 0 0 0 0 1 1 0 2021-10-18 0 1 0 0 0 0 0
II. Continuous index
This part does not differ much of the previous one.
We build a continuous version of the index to be use to reindex
Monday is chosen as the first day of week (obviouly for the two indices)
#(1a) Build a series of first_day_of_week, monday is chosen as the weeks_index = df['Date'] - df['Date'].dt.weekday * np.timedelta64(1, 'D') #(1b) Build a continuous series of first_day_of_week continuous_weeks_index = pd.date_range(start=weeks_index.min(), end=weeks_index.max(), freq='W-MON') # monday #(2) Groupby, unstack, reindex, and some tidying df2 = ( df # groupby and count .groupby([df['Name'], weeks_index]) .count() .rename(columns={'Date':'Count'}) # unstack on weeks .swaplevel() # weeks to first level .sort_index() .unstack(1) # reindex to insert weeks with no data .reindex(continuous_weeks_index) # new index # clean up .fillna(0.0) .astype(int) .rename_axis('first_day_of_week') ) >>>print(df2) Name A B C D K M R first_day_of_week 2021-08-30 1 0 0 0 0 0 0 2021-09-06 0 0 3 1 0 0 0 2021-09-13 0 0 0 0 1 0 0 2021-09-20 0 0 0 1 0 0 1 2021-09-27 0 0 0 0 1 1 0 2021-10-04 0 0 0 0 0 0 0 2021-10-11 0 0 0 0 0 0 0 2021-10-18 0 1 0 0 0 0 0
Last step if needed
df2.stack()