Dt | 1/2/21 | 2/2/21 | 3/2/21 | 4/2/21 | 5/2/21 | 6/2/21 | 7/2/21 | Attendance(Expected output in python) |
---|---|---|---|---|---|---|---|---|
san | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | TRUE | 1/2/21 – 7/2/21 |
don | TRUE | TRUE | FALSE | TRUE | TRUE | TRUE | TRUE | 1/2/21 -2/2/21,4/2/21-7/2/21 |
sam | FALSE | TRUE | TRUE | FALSE | TRUE | TRUE | TRUE | 2/2/21 – 3/2/21,5/2/21-7/2/21 |
den | FALSE | FALSE | TRUE | FALSE | TRUE | TRUE | FALSE | 3/2/21,5/2/21 – 6/2/21 |
I want to add Attendance column using pandas
Advertisement
Answer
I’m starting from a previous answer to get a list of dates in the column “Attendance”.
df['Attendance'] = [df.columns[(df.iloc[i] == True)].tolist()) for i in range(len(df.Dt)]
Now, I use pandas deltas (differences in time) to find consecutive days, extract only the first and last ones of each range of consecutives, and print it with the right format:
def get_consecutives(dates): consecutives = [] delta_1day = pd.Timedelta('1d') while len(dates) > 1: if dates[1] - dates[0] == delta_1day: consecutives.append(dates.pop(0)) else: break consecutives.append(dates.pop(0)) return consecutives def all_days(dates): total = [] while dates: total.append(get_consecutives(dates)) formatted_total = ', '.join(['-'.join((range[0].strftime('%d/%m/%y'), range[-1].strftime('%d/%m/%y'))) if len(range) > 1 else range[0].strftime('%d/%m/%y') for range in total]) return formatted_total
Finally, I’ll make a function to map all the previous to the “Attendance” column:
def mapping(list_of_dates): dates = pd.to_datetime(list_of_dates, dayfirst=True,).to_list() return all_days(dates) df['Attendance'] = df['Attendance'].map(mapping)