Skip to content
Advertisement

Python Dataframe – only keep oldest records from each month

I have a Pandas Dataframe with a date column. I want to only have the oldest records for each month and remove any records that came before. There will be duplicates and I want to keep them. I also need a new column with only the month and year.

Input

Provider date
Apple 01/01/2022
Apple 05/01/2022
Apple 20/01/2022
Apple 20/01/2022
Apple 05/02/2022
Apple 10/02/2022

Output:

Provider date month_year
Apple 20/01/2022 01/2022
Apple 20/01/2022 01/2022
Apple 10/02/2022 02/2022

Advertisement

Answer

Create column month_year with Series.dt.strftime and then compare maximal datetimes per groups by original date column in GroupBy.transform and filter in boolean indexing:

JavaScript
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement