I’m trying to create an additional column in a data frame to show the number of network days (excluding custom holidays) between two dates. I’m using a function to which I’m trying to pass dates from df
‘s columns as arguments, but I can’t make it work.
Below is my code (I’m using two made-up holidays in the given set):
from networkdays import networkdays import datetime as dt import numpy as np import pandas as pd public_holidays_list = [dt.date(2021, 1, 6), dt.date(2021, 1, 7)] public_holidays = set(public_holidays_list) def working_days(start, end, holidays): days = networkdays.Networkdays(start, end, holidays) working_days = len(days.networkdays()) return working_days
The formula itself works fine:
print(working_days(dt.date(2021, 1, 4), dt.date(2021, 1, 8), public_holidays))
3
Minimal data frame with the dtypes
I’m working on:
d = {'Name': ['A', 'B'], 'Start_Date': [dt.date(2021, 1, 4), dt.date(2021, 1, 11)], 'End_Date': [dt.date(2021, 1, 8), dt.date(2021, 1, 15)]} df = pd.DataFrame(data = d) df['Start_Date'] = pd.to_datetime(df['Start_Date']) df['End_Date'] = pd.to_datetime(df['End_Date'])
When I’m trying the below way…
df['Working_Days'] = working_days(df['Start_Date'], df['End_Date'])
…I’m getting an error:
AttributeError: ‘Series’ object has no attribute ‘days’
I’ve also tried to use numpy
:
df['Working_Days'] = np.vectorize(working_days)(df['Start_Date'], df['End_Date'])
I got an error as well:
AttributeError: ‘numpy.timedelta64’ object has no attribute ‘days’
Could you point me in the right direction?
EDIT: The correct answer to my problem is @Kris’s last comment.
IMPORTANT! Although the lambda
doesn’t return any errors, it takes public_holidays
into consideration correctly in 2 scenarios:
A) The elements of public_holidays
are of class datetime.date
and df
‘s dates are of class object
(I got this by removing pd.to_datetime()
lines from the code).
B) The public_holidays
is of type list
(created from an Excel table via public_holidays = df_ph['Date'].tolist()
), its elements are of class Timestamp
and pd.to_datetime()
lines are not removed from the code above (making dates in df
datetime64[ns]
).
Advertisement
Answer
As per my comment, use .apply
:
df['Working_Days'] = df.apply(lambda x: working_days(x.Start_Date, x.End_Date, public_holidays), axis=1)