Skip to content
Advertisement

I want to add date range where value is True in pandas

Dt 1/2/21 2/2/21 3/2/21 4/2/21 5/2/21 6/2/21 7/2/21 Attendance(Expected output in python)
san TRUE TRUE TRUE TRUE TRUE TRUE TRUE 1/2/21 – 7/2/21
don TRUE TRUE FALSE TRUE TRUE TRUE TRUE 1/2/21 -2/2/21,4/2/21-7/2/21
sam FALSE TRUE TRUE FALSE TRUE TRUE TRUE 2/2/21 – 3/2/21,5/2/21-7/2/21
den FALSE FALSE TRUE FALSE TRUE TRUE FALSE 3/2/21,5/2/21 – 6/2/21

I want to add Attendance column using pandas

Advertisement

Answer

I’m starting from a previous answer to get a list of dates in the column “Attendance”.

df['Attendance'] = [df.columns[(df.iloc[i] == True)].tolist()) for i in range(len(df.Dt)]

Now, I use pandas deltas (differences in time) to find consecutive days, extract only the first and last ones of each range of consecutives, and print it with the right format:

def get_consecutives(dates):
    consecutives = []
    delta_1day = pd.Timedelta('1d')    
    while len(dates) > 1:
        if dates[1] - dates[0] == delta_1day:
            consecutives.append(dates.pop(0))
        else:
            break
    consecutives.append(dates.pop(0))
    return consecutives

def all_days(dates):
  total = []
  while dates:
      total.append(get_consecutives(dates))

  formatted_total = ', '.join(['-'.join((range[0].strftime('%d/%m/%y'), range[-1].strftime('%d/%m/%y'))) if len(range) > 1 
                              else range[0].strftime('%d/%m/%y') for range in total])
  return formatted_total

Finally, I’ll make a function to map all the previous to the “Attendance” column:

def mapping(list_of_dates):
    dates = pd.to_datetime(list_of_dates, dayfirst=True,).to_list()
    return all_days(dates)

df['Attendance'] = df['Attendance'].map(mapping)
Advertisement