Skip to content
Advertisement

Split Business days in respective month

df1

no  From        To          check   
1   27-Jan-20   28-Mar-20   a                                    
2   28-Mar-20   12-Apr-20   a                                 
3   29-May-20   29-May-20   b                             
4   5-Apr-20    12-Apr-20   b                                 

df2

col1    col2
a       9-Apr-20
b       30-Mar-20

df

no  From        To          check   total   Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1   27-Jan-20   28-Mar-20   a       45      5   20  20                                  
2   28-Mar-20   12-Apr-20   a       9           2   7                               
3   29-May-20   29-May-20   b       1                   1                           
4   5-Apr-20    12-Apr-20   b       5               5   

i need to calculate 2 things

  1. column “Total” based on working days between “From” and “To” and include any holiday from df2.
  2. split the “Total” column in respective months (Jan to Dec columns)

For part 1 : The column “total” in df1 is calculated using

np.busday_count('2020-01-27','2020-03-28')

but this is not acurate and not able to include holiday(df2)in this i tried to directly create dataframe using

df['total']=np.busday_count(df1['From'].astype('datetime64[D]')
,df1['To'].astype('datetime64[D]'))

but it is giving error.

Advertisement

Answer

You can use bdate_range in a custom function

# dict of num to month mapping
months = pd.tseries.frequencies.MONTH_ALIASES

df2['col2'] = pd.to_datetime(df2['col2'], dayfirst=True)

# holiday month
df['holiday'] = df['check'].map(df2.set_index(['col1'])['col2']).dt.month

def count_by_month(s):

      start, end, holiday = s['From'], s['To'], s['holiday']

      valid_dates = pd.bdate_range(start=start, end=end).month
      count = dict(pd.Series(valid_dates).value_counts())

      # subtract holidays
      if holiday in count:
         count[holiday] -= 1
    
      return pd.concat([s, pd.Series({v: count.get(k, 0) for k, v in months.items()})], axis=0)

print(df)

   no         From            To check  total  holiday  JAN  FEB  MAR  APR  
0   1    27-Jan-20     28-Mar-20     a     45        4    5   20   20    0   
1   2    28-Mar-20     12-Apr-20     a      9        4    0    0    2    7   
2   3    29-May-20     29-May-20     b      1        3    0    0    0    0   
3   4    5-Apr-20      12-Apr-20     b      5        3    0    0    0    5   

   MAY  JUN  JUL  AUG  SEP  OCT  NOV  DEC  
0    0    0    0    0    0    0    0    0  
1    0    0    0    0    0    0    0    0  
2    1    0    0    0    0    0    0    0  
3    0    0    0    0    0    0    0    0  
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement