I want to compare the dates ranging from January 1st till the current day, from this year versus from last year.
example: get the rows with dates ranging from january 1st till november 29th 2020 get the rows with dates ranging from january 1st till november 29th 2021
here is what I have tryed doing, and the only way I picture it done:
#get year to date dates for last year current_year = datetime.datetime.now().year today = datetime.datetime.now().day last_year = int(current_year) - 1 sales_data_df['Date'] = pd.to_datetime(sales_data_df['Date']) sales_data_last_year = sales_data_df[sales_data_df['Date'].dt.year == last_year] sales_data_last_ytd = sales_data_last_year[sales_data_last_year['Date'].dt.day < today] sales_data_df2 = sales_data_last_ytd.groupby('Id').agg({'Quantity': sum, 'NetAmount': sum}) sales_data_df2['COGS_last_YTD'] = sales_data_df2['Quantity'] * sales_data_df2['NetAmount'] #get year to date dates for this year sales_data_df['Date'] = pd.to_datetime(sales_data_df['Date']) sales_data_this_year = sales_data_df[sales_data_df['Date'].dt.year == current_year] sales_data_ytd = sales_data_last_year[sales_data_this_year['Date'].dt.day < today] sales_data_df2 = sales_data_ytd.groupby('Id').agg({'Quantity': sum, 'NetAmount': sum}) sales_data_df2['COGS_YTD'] = sales_data_df2['Quantity'] * sales_data_df2['NetAmount']
and here is the output error:
TypeError: '<' not supported between instances of 'int' and 'datetime.datetime'
I am fairly new to pandas and I don’t get why the error is about a date format mismatch since I made sure to use pd_todatetime, What is wrong with the processus I follow?
what would be alternative way to get that done without running in the same issue?
Advertisement
Answer
I think there’s an easier way to do this, I also think you should check if there are any leap years if you plan on using the code with more data.
import pandas as pd today = pd.to_datetime("now") first_day_2021 = pd.to_datetime("01-01-2021") today_last_year = today - pd.Timedelta(365, "day") first_day_2020 = pd.to_datetime("01-01-2020")
Then you can easily select periods, given you have a datetime index.
df[first_day_2021: today] df[first_day_2020: today_last_year]