Skip to content
Advertisement

Sort a pandas dataframe series by month name

I have a Series object that has:

    date   price
    dec      12
    may      15
    apr      13
    ..

Problem statement: I want to make it appear by month and compute the mean price for each month and present it with a sorted manner by month.

Desired Output:

 month mean_price
  Jan    XXX
  Feb    XXX
  Mar    XXX

I thought of making a list and passing it in a sort function:

months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]

but the sort_values doesn’t support that for series.

One big problem I have is that even though

df = df.sort_values(by='date',ascending=True,inplace=True) works to the initial df but after I did a groupby, it didn’t maintain the order coming out from the sorted df.

To conclude, I needed from the initial data frame these two columns. Sorted the datetime column and through a groupby using the month (dt.strftime(‘%B’)) the sorting got messed up. Now I have to sort it by month name.


My code:

df # has 5 columns though I need the column 'date' and 'price'

df.sort_values(by='date',inplace=True) #at this part it is sorted according to date, great
total=(df.groupby(df['date'].dt.strftime('%B'))['price'].mean()) # Though now it is not as it was but instead the months appear alphabetically

Advertisement

Answer

Thanks @Brad Solomon for offering a faster way to capitalize string!

Note 1 @Brad Solomon’s answer using pd.categorical should save your resources more than my answer. He showed how to assign order to your categorical data. You should not miss it :P

Alternatively, you can use.

df = pd.DataFrame([["dec", 12], ["jan", 40], ["mar", 11], ["aug", 21],
                  ["aug", 11], ["jan", 11], ["jan", 1]], 
                   columns=["Month", "Price"])
# Preprocessing: capitalize `jan`, `dec` to `Jan` and `Dec`
df["Month"] = df["Month"].str.capitalize()

# Now the dataset should look like
#   Month Price
#   -----------
#    Dec    XX
#    Jan    XX
#    Apr    XX

# make it a datetime so that we can sort it: 
# use %b because the data use the abbreviation of month
df["Month"] = pd.to_datetime(df.Month, format='%b', errors='coerce').dt.month
df = df.sort_values(by="Month")

total = (df.groupby(df['Month'])['Price'].mean())

# total 
Month
1     17.333333
3     11.000000
8     16.000000
12    12.000000

Note 2 groupby by default will sort group keys for you. Be aware to use the same key to sort and groupby in the df = df.sort_values(by=SAME_KEY) and total = (df.groupby(df[SAME_KEY])['Price'].mean()). Otherwise, one may gets unintended behavior. See Groupby preserve order among groups? In which way? for more information.

Note 3 A more computationally efficient way is first compute mean and then do sorting on months. In this way, you only need to sort on 12 items rather than the whole df. It will reduce the computational cost if one don’t need df to be sorted.

Note 4 For people already have month as index, and wonder how to make it categorical, take a look at pandas.CategoricalIndex @jezrael has a working example on making categorical index ordered in Pandas series sort by month index

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement