Skip to content
Advertisement

Pandas grouping by week

I have a data frame in pandas like this:

Name  Date
A     9/1/21
B     10/20/21
C     9/8/21
D     9/20/21
K     9/29/21
K     9/15/21
M     10/1/21
C     9/12/21
D     9/9/21
C     9/9/21
R     9/20/21

I need to get the count of items by week.

weeks = [9/6/21, 9/13, 9/20/21, 9/27/21, 10/4/21]  

Example: From 9/6 to 9/13, the output should be:

Name  Weekly count
A     0
B     0
C     3
D     1
M     0
K     0
R     0

Similarly, I need to find the count on these intervals: 9/13 to 9/20, 9/20 to 9/27, and 9/27 to 10/4. Thank you!

Advertisement

Answer

May be with the caveat of the definition of the first day of a week, you could take something in the following code.

df = pd.DataFrame(data=d)
df['Date']=pd.to_datetime(df['Date'])

I. Discontinuous index

Monday is chosen as the first day of week

#(1) Build a series of first_day_of_week, monday is chosen as the first day of week
weeks_index = df['Date'] - df['Date'].dt.weekday * np.timedelta64(1, 'D') 

#(2) Groupby and some tidying
df2 = ( df.groupby([df['Name'], weeks_index])
          .count()
          .rename(columns={'Date':'Count'})
        
          .swaplevel()   # weeks to first level
          .sort_index() 
          .unstack(1).fillna(0.0)
        
          .astype(int)
          .rename_axis('first_day_of_week')
      )

>>> print(df2)
Name                  A  B  C  D  K  M  R
first_day_of_week                        
2021-08-30            1  0  0  0  0  0  0
2021-09-06            0  0  3  1  0  0  0
2021-09-13            0  0  0  0  1  0  0
2021-09-20            0  0  0  1  0  0  1
2021-09-27            0  0  0  0  1  1  0
2021-10-18            0  1  0  0  0  0  0

II. Continuous index

This part does not differ much of the previous one.

We build a continuous version of the index to be use to reindex

Monday is chosen as the first day of week (obviouly for the two indices)

#(1a) Build a series of first_day_of_week, monday is chosen as the 
weeks_index = df['Date'] - df['Date'].dt.weekday * np.timedelta64(1, 'D')
#(1b) Build a continuous series of first_day_of_week
continuous_weeks_index = pd.date_range(start=weeks_index.min(), 
                                 end=weeks_index.max(),
                                 freq='W-MON')    # monday

#(2) Groupby, unstack, reindex, and some tidying
df2 = ( df
          # groupby and count
          .groupby([df['Name'], weeks_index])
          .count()
          .rename(columns={'Date':'Count'})
        
          # unstack on weeks 
          .swaplevel()    # weeks to first level
          .sort_index()
          .unstack(1)

          # reindex to insert weeks with no data
          .reindex(continuous_weeks_index)  # new index
        
          # clean up
          .fillna(0.0)               
          .astype(int)
          .rename_axis('first_day_of_week')
      )

>>>print(df2)
Name               A  B  C  D  K  M  R
first_day_of_week                     
2021-08-30         1  0  0  0  0  0  0
2021-09-06         0  0  3  1  0  0  0
2021-09-13         0  0  0  0  1  0  0
2021-09-20         0  0  0  1  0  0  1
2021-09-27         0  0  0  0  1  1  0
2021-10-04         0  0  0  0  0  0  0
2021-10-11         0  0  0  0  0  0  0
2021-10-18         0  1  0  0  0  0  0

Last step if needed

df2.stack()

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement