Skip to content
Advertisement

Pandas: Get year to date dates from previous year

I want to compare the dates ranging from January 1st till the current day, from this year versus from last year.

example: get the rows with dates ranging from january 1st till november 29th 2020 get the rows with dates ranging from january 1st till november 29th 2021

here is what I have tryed doing, and the only way I picture it done:

#get year to date dates for last year
current_year = datetime.datetime.now().year
today = datetime.datetime.now().day
last_year = int(current_year) - 1
sales_data_df['Date'] = pd.to_datetime(sales_data_df['Date'])
sales_data_last_year = sales_data_df[sales_data_df['Date'].dt.year == last_year]
sales_data_last_ytd = sales_data_last_year[sales_data_last_year['Date'].dt.day < today]
sales_data_df2 = sales_data_last_ytd.groupby('Id').agg({'Quantity': sum, 'NetAmount': sum})
sales_data_df2['COGS_last_YTD'] = sales_data_df2['Quantity'] * sales_data_df2['NetAmount']

#get year to date dates for this year

sales_data_df['Date'] = pd.to_datetime(sales_data_df['Date'])
sales_data_this_year = sales_data_df[sales_data_df['Date'].dt.year == current_year]
sales_data_ytd = sales_data_last_year[sales_data_this_year['Date'].dt.day < today]
sales_data_df2 = sales_data_ytd.groupby('Id').agg({'Quantity': sum, 'NetAmount': sum})
sales_data_df2['COGS_YTD'] = sales_data_df2['Quantity'] * sales_data_df2['NetAmount']
 

and here is the output error:

TypeError: '<' not supported between instances of 'int' and 'datetime.datetime'

I am fairly new to pandas and I don’t get why the error is about a date format mismatch since I made sure to use pd_todatetime, What is wrong with the processus I follow?

what would be alternative way to get that done without running in the same issue?

Advertisement

Answer

I think there’s an easier way to do this, I also think you should check if there are any leap years if you plan on using the code with more data.

import pandas as pd

today = pd.to_datetime("now")
first_day_2021 = pd.to_datetime("01-01-2021")
today_last_year = today - pd.Timedelta(365, "day")
first_day_2020 = pd.to_datetime("01-01-2020")

Then you can easily select periods, given you have a datetime index.

df[first_day_2021: today]
df[first_day_2020: today_last_year]
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement