I have two dataframes. month_data dataframe has days from start of the month to the end.
student_df with each student’s only present data.
I’m trying to map both dataframes so that the remaining days left for each student should be marked as absent
month_data
day_of_month
0 2021-01-01
1 2021-01-02
2 2021-01-03
3 2021-01-04
4 2021-01-05
5 2021-01-06
6 2021-01-07
. .
. .
. .
30 2021-01-31
month_data = pd.DataFrame({'day_of_month':pd.date_range('01/01/2021','31/01/2021')})
student_df
student_data attendence day_of_month
0 marcus present 2021-01-01
1 marcus present 2021-01-03
2 paul present 2021-01-04
final_df
student_data attendence day_of_month
0 marcus present 2021-01-01
1 marcus absent 2021-01-02
2 marcus present 2021-01-03
3 marcus absent 2021-01-04
4 marcus absent 2021-01-05
. . . .
. . . .
30 marcus absent 2021-01-31
31 paul absent 2021-01-01
32 paul absent 2021-01-02
33 paul absent 2021-01-03
34 paul present 2021-01-04
. . . .
. . . .
61 paul absent 2021-01-31
Advertisement
Answer
You can create a new dataframe containing all dates and the a name for each student on each date:
Input
#df1 day_of_month 0 2021-01-01 1 2021-01-02 2 2021-01-03 3 2021-01-04 4 2021-01-05 5 2021-01-06 6 2021-01-07 #df2 student_data attendence day_of_month 0 marcus present 2021-01-01 1 marcus present 2021-01-03 2 paul present 2021-01-04
Code
students = df2['student_data'].unique().tolist()
df = pd.concat([df1.assign(student_data = name) for name in students])
df = df.merge(df2, on=['day_of_month', 'student_data'], how='left')
df['attendence'] = df['attendence'].fillna('absent')
Output
day_of_month student_data attendence 0 2021-01-01 marcus present 1 2021-01-02 marcus absent 2 2021-01-03 marcus present 3 2021-01-04 marcus absent 4 2021-01-05 marcus absent 5 2021-01-06 marcus absent 6 2021-01-07 marcus absent 7 2021-01-01 paul absent 8 2021-01-02 paul absent 9 2021-01-03 paul absent 10 2021-01-04 paul present 11 2021-01-05 paul absent 12 2021-01-06 paul absent 13 2021-01-07 paul absent