Skip to content
Advertisement

How to use df groupby to return counts on specific values in column across each month

I have a dataframe made up of dummy car purchases across a year which looks like:

df =

purchase_date     brand
2021-02-13        BMW
2021-02-28        BMW
2021-03-10        Audi
2021-03-11        BMW
...

What I’m looking for is to get an aggregated count of each brand of car for each month in 2021, so it would look like this:

df =

              BMW   Audi
(2021-02)     2     0 
(2021-03)     1     1
...

So far I’ve used this code to group by the year, month but I can’t split it to count individual brands:

df = df.groupby([df['purchase_date'].dt.year.rename('year'), df3['purchase_date'].dt.month.rename('month')]).agg({'count'})

This returns:

              ('brand','count')
(2021-02)     2
(2021-03)     2

Advertisement

Answer

Use crosstab with month periods:

df1 = pd.crosstab(df['purchase_date'].dt.to_period('m').rename('year'), df['brand'])
print (df1)
brand    Audi  BMW
year              
2021-02     0    2
2021-03     1    1

Your solution is with add column brand, aggregate GroupBy.size and Series.unstack:

df2 = (df.groupby([df['purchase_date'].dt.year.rename('year'), 
                 df['purchase_date'].dt.month.rename('month'), 'brand'])
        .size()
        .unstack(fill_value=0))
print (df2)
brand       Audi  BMW
year month           
2021 2         0    2
     3         1    1

Alternative:

df3 = (df.groupby([pd.Grouper(freq='MS',key='purchase_date'), 'brand'])
        .size()
        .unstack(fill_value=0))
print (df3)
brand          Audi  BMW
purchase_date           
2021-02-01        0    2
2021-03-01        1    1
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement