I have two dataframes. month_data
dataframe has days from start of the month to the end.
student_df
with each student’s only present
data.
I’m trying to map both dataframes so that the remaining days left for each student should be marked as absent
month_data
day_of_month 0 2021-01-01 1 2021-01-02 2 2021-01-03 3 2021-01-04 4 2021-01-05 5 2021-01-06 6 2021-01-07 . . . . . . 30 2021-01-31
month_data = pd.DataFrame({'day_of_month':pd.date_range('01/01/2021','31/01/2021')})
student_df
student_data attendence day_of_month 0 marcus present 2021-01-01 1 marcus present 2021-01-03 2 paul present 2021-01-04
final_df
student_data attendence day_of_month 0 marcus present 2021-01-01 1 marcus absent 2021-01-02 2 marcus present 2021-01-03 3 marcus absent 2021-01-04 4 marcus absent 2021-01-05 . . . . . . . . 30 marcus absent 2021-01-31 31 paul absent 2021-01-01 32 paul absent 2021-01-02 33 paul absent 2021-01-03 34 paul present 2021-01-04 . . . . . . . . 61 paul absent 2021-01-31
Advertisement
Answer
You can create a new dataframe containing all dates and the a name for each student on each date:
Input
#df1 day_of_month 0 2021-01-01 1 2021-01-02 2 2021-01-03 3 2021-01-04 4 2021-01-05 5 2021-01-06 6 2021-01-07 #df2 student_data attendence day_of_month 0 marcus present 2021-01-01 1 marcus present 2021-01-03 2 paul present 2021-01-04
Code
students = df2['student_data'].unique().tolist() df = pd.concat([df1.assign(student_data = name) for name in students]) df = df.merge(df2, on=['day_of_month', 'student_data'], how='left') df['attendence'] = df['attendence'].fillna('absent')
Output
day_of_month student_data attendence 0 2021-01-01 marcus present 1 2021-01-02 marcus absent 2 2021-01-03 marcus present 3 2021-01-04 marcus absent 4 2021-01-05 marcus absent 5 2021-01-06 marcus absent 6 2021-01-07 marcus absent 7 2021-01-01 paul absent 8 2021-01-02 paul absent 9 2021-01-03 paul absent 10 2021-01-04 paul present 11 2021-01-05 paul absent 12 2021-01-06 paul absent 13 2021-01-07 paul absent