I have a dataframe made up of dummy car purchases across a year which looks like:
df =
purchase_date brand 2021-02-13 BMW 2021-02-28 BMW 2021-03-10 Audi 2021-03-11 BMW ...
What I’m looking for is to get an aggregated count of each brand of car for each month in 2021, so it would look like this:
df =
BMW Audi (2021-02) 2 0 (2021-03) 1 1 ...
So far I’ve used this code to group by the year, month but I can’t split it to count individual brands:
df = df.groupby([df['purchase_date'].dt.year.rename('year'), df3['purchase_date'].dt.month.rename('month')]).agg({'count'})
This returns:
('brand','count') (2021-02) 2 (2021-03) 2
Advertisement
Answer
Use crosstab
with month periods:
df1 = pd.crosstab(df['purchase_date'].dt.to_period('m').rename('year'), df['brand']) print (df1) brand Audi BMW year 2021-02 0 2 2021-03 1 1
Your solution is with add column brand
, aggregate GroupBy.size
and Series.unstack
:
df2 = (df.groupby([df['purchase_date'].dt.year.rename('year'), df['purchase_date'].dt.month.rename('month'), 'brand']) .size() .unstack(fill_value=0)) print (df2) brand Audi BMW year month 2021 2 0 2 3 1 1
Alternative:
df3 = (df.groupby([pd.Grouper(freq='MS',key='purchase_date'), 'brand']) .size() .unstack(fill_value=0)) print (df3) brand Audi BMW purchase_date 2021-02-01 0 2 2021-03-01 1 1