Python Dataframe – only keep oldest records from each month

Question

I have a Pandas Dataframe with a date column. I want to only have the oldest records for each month and remove any records that came before. There will be duplicates and I want to keep them. I also need a new column with only the month and year. Input Provider date Apple 01/01/2022 Apple 05/01/2022 Apple 20/01/2022 Apple 20/01/2022

Accepted Answer

Create column month_year with Series.dt.strftime and then compare maximal datetimes per groups by original date column in GroupBy.transform and filter in boolean indexing:df['date'] = pd.to_datetime(df['date'], dayfirst=True)df = df.assign(month_year = df['date'].dt.strftime('%m/%Y'))df = df[df.groupby(['Provider', 'month_year'])['date'].transform('max').eq(df['date'])]print (df)  Provider       date month_year2    Apple 2022-01-20    01/20223    Apple 2022-01-20    01/20225    Apple 2022-02-10    02/2022

Provider	date
Apple	01/01/2022
Apple	05/01/2022
Apple	20/01/2022
Apple	20/01/2022
Apple	05/02/2022
Apple	10/02/2022

Provider	date	month_year
Apple	20/01/2022	01/2022
Apple	20/01/2022	01/2022
Apple	10/02/2022	02/2022

Advertisement

Answer