df1
no From To check 1 27-Jan-20 28-Mar-20 a 2 28-Mar-20 12-Apr-20 a 3 29-May-20 29-May-20 b 4 5-Apr-20 12-Apr-20 b
df2
col1 col2 a 9-Apr-20 b 30-Mar-20
df
no From To check total Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 27-Jan-20 28-Mar-20 a 45 5 20 20 2 28-Mar-20 12-Apr-20 a 9 2 7 3 29-May-20 29-May-20 b 1 1 4 5-Apr-20 12-Apr-20 b 5 5
i need to calculate 2 things
- column “Total” based on working days between “From” and “To” and include any holiday from df2.
- split the “Total” column in respective months (Jan to Dec columns)
For part 1 : The column “total” in df1 is calculated using
np.busday_count('2020-01-27','2020-03-28')
but this is not acurate and not able to include holiday(df2)in this i tried to directly create dataframe using
df['total']=np.busday_count(df1['From'].astype('datetime64[D]') ,df1['To'].astype('datetime64[D]'))
but it is giving error.
Advertisement
Answer
You can use bdate_range
in a custom function
# dict of num to month mapping months = pd.tseries.frequencies.MONTH_ALIASES df2['col2'] = pd.to_datetime(df2['col2'], dayfirst=True) # holiday month df['holiday'] = df['check'].map(df2.set_index(['col1'])['col2']).dt.month def count_by_month(s): start, end, holiday = s['From'], s['To'], s['holiday'] valid_dates = pd.bdate_range(start=start, end=end).month count = dict(pd.Series(valid_dates).value_counts()) # subtract holidays if holiday in count: count[holiday] -= 1 return pd.concat([s, pd.Series({v: count.get(k, 0) for k, v in months.items()})], axis=0) print(df) no From To check total holiday JAN FEB MAR APR 0 1 27-Jan-20 28-Mar-20 a 45 4 5 20 20 0 1 2 28-Mar-20 12-Apr-20 a 9 4 0 0 2 7 2 3 29-May-20 29-May-20 b 1 3 0 0 0 0 3 4 5-Apr-20 12-Apr-20 b 5 3 0 0 0 5 MAY JUN JUL AUG SEP OCT NOV DEC 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0