df1
no From To check 1 27-Jan-20 28-Mar-20 a 2 28-Mar-20 12-Apr-20 a 3 29-May-20 29-May-20 b 4 5-Apr-20 12-Apr-20 b
df2
col1 col2 a 9-Apr-20 b 30-Mar-20
df
no From To check total Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1 27-Jan-20 28-Mar-20 a 45 5 20 20 2 28-Mar-20 12-Apr-20 a 9 2 7 3 29-May-20 29-May-20 b 1 1 4 5-Apr-20 12-Apr-20 b 5 5
i need to calculate 2 things
- column “Total” based on working days between “From” and “To” and include any holiday from df2.
- split the “Total” column in respective months (Jan to Dec columns)
For part 1 : The column “total” in df1 is calculated using
np.busday_count('2020-01-27','2020-03-28')
but this is not acurate and not able to include holiday(df2)in this i tried to directly create dataframe using
df['total']=np.busday_count(df1['From'].astype('datetime64[D]')
,df1['To'].astype('datetime64[D]'))
but it is giving error.
Advertisement
Answer
You can use bdate_range in a custom function
# dict of num to month mapping
months = pd.tseries.frequencies.MONTH_ALIASES
df2['col2'] = pd.to_datetime(df2['col2'], dayfirst=True)
# holiday month
df['holiday'] = df['check'].map(df2.set_index(['col1'])['col2']).dt.month
def count_by_month(s):
start, end, holiday = s['From'], s['To'], s['holiday']
valid_dates = pd.bdate_range(start=start, end=end).month
count = dict(pd.Series(valid_dates).value_counts())
# subtract holidays
if holiday in count:
count[holiday] -= 1
return pd.concat([s, pd.Series({v: count.get(k, 0) for k, v in months.items()})], axis=0)
print(df)
no From To check total holiday JAN FEB MAR APR
0 1 27-Jan-20 28-Mar-20 a 45 4 5 20 20 0
1 2 28-Mar-20 12-Apr-20 a 9 4 0 0 2 7
2 3 29-May-20 29-May-20 b 1 3 0 0 0 0
3 4 5-Apr-20 12-Apr-20 b 5 3 0 0 0 5
MAY JUN JUL AUG SEP OCT NOV DEC
0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0