Skip to content
Advertisement

return last date and value each month in pandas

I have a df in pandas with daily data. I would like to return the last value of each month. I thought the simple solution would be to .resample("M").apply(lambda ser: ser.iloc[-1,]) however, it seems as if resample actually computes the month end date rather than return the actual date that appears that month. Is this intended behavior? MWE:

import pandas as pd
import numpy as np
df = pd.Series(np.arange(100), index=pd.date_range(start="2000-01-02", periods=100)).to_frame()
df.sort_index().resample("M").apply(lambda ser: ser.iloc[-1,])
#             0
#2000-01-31  29
#2000-02-29  58
#2000-03-31  89
#2000-04-30  99

While the last date appearing in df is 2000-04-10

Advertisement

Answer

By using resample with the offset M, you are downsampling your sample to calendar month’s end (see the linked documentation on offsets), and then passing a function. So your indices will always be the last day of that month, and this is indeed the intended behaviour. The function you are applying (lambda ser: ser.iloc[-1,]) is just saying: for the calendar date that ends on this day, what is the last value found in the original data.

For an example, you could also resample to month start, using the offset MS instead of M, and the result will be the same, except that the index will be the first day of the calendar month instead of the last day:

# Resample to month end, as you had originally:
>>> df.sort_index().resample("M").apply(lambda ser: ser.iloc[-1,])
             0
2000-01-31  29
2000-02-29  58
2000-03-31  89
2000-04-30  99

# Resample to month start: same data, except index is month start instead of month end
>>> df.sort_index().resample("MS").apply(lambda ser: ser.iloc[-1,])
             0
2000-01-01  29
2000-02-01  58
2000-03-01  89
2000-04-01  99

As pointed out by Wen, if you just want to show the actual last date found in your data, you’re better off with a groupby. Resample is useful if you want to up- or down- sample your data to a different time frequency, not so much to select real data from your original time frequency

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement