Find max by year and return date on which max occurred in Pandas with dates as index

I have this dataframe

date,AA
1980-01-01, 77.7
1980-01-02, 86
1980-01-03, 92.3
1980-01-04, 96.4
1980-01-05, 85.7
1980-01-06, 75.7
1980-01-07, 86.8
1980-01-08, 93.2
1985-08-13, 224.6
1985-08-14, 213.9
1985-08-15, 205.7
1985-08-16, 207.3
1985-08-17, 202.1

JavaScript
​x
 
date,AA
1980-01-01, 77.7
1980-01-02, 86
1980-01-03, 92.3
1980-01-04, 96.4
1980-01-05, 85.7
1980-01-06, 75.7
1980-01-07, 86.8
1980-01-08, 93.2
1985-08-13, 224.6
1985-08-14, 213.9
1985-08-15, 205.7
1985-08-16, 207.3
1985-08-17, 202.1
​

I would like to compute the max for each year and the date where it happens. I am struggling because I would like to keep indeed the date as index.

Indeed I read it as:

dfr    = pd.read_csv(fnamed, sep=',', header = 0, index_col=0, parse_dates=True)

JavaScript
 
dfr    = pd.read_csv(fnamed, sep=',', header = 0, index_col=0, parse_dates=True)
​

I know that I could resample as

dfr_D = dfr.resample('Y').max()

JavaScript
 
dfr_D = dfr.resample('Y').max()
​

but in this case I would lose the information about the location of the maximum value within the year.

I have found this:

idx = dfr.groupby(lambda x: dfr['date'][x].year)["A"].idxmax()

JavaScript
 
idx = dfr.groupby(lambda x: dfr['date'][x].year)["A"].idxmax()
​

However, dfr[‘date’] seems to be the name of the column while in my case date in an index and ‘.year’ is not one of its properties.

I have the felling that I should work with “groupby” and “indexmax”. However, all the attends that I made, they all failed.

Thanks in advance

Answer

Assuming “date” is of datetime type and a column, you can use the following to slice your data with the maximum per group:

df.loc[df.groupby(df['date'].dt.year)['AA'].idxmax().values]

JavaScript
 
df.loc[df.groupby(df['date'].dt.year)['AA'].idxmax().values]
​

output:

        date     AA
3 1980-01-04   96.4
8 1985-08-13  224.6

JavaScript
 
        date     AA
3 1980-01-04   96.4
8 1985-08-13  224.6
​

If “date” is the index:

df.loc[df.groupby(df.index.year)['AA'].idxmax().values]

JavaScript
 
df.loc[df.groupby(df.index.year)['AA'].idxmax().values]
​

output:

               AA
date             
1980-01-04   96.4
1985-08-13  224.6

JavaScript
 
               AA
date             
1980-01-04   96.4
1985-08-13  224.6
​

Advertisement

Answer