Skip to content
Advertisement

Python DataFrame: Map two dataframes based on day of month?

I have two dataframes. month_data dataframe has days from start of the month to the end.

student_df with each student’s only present data.

I’m trying to map both dataframes so that the remaining days left for each student should be marked as absent

month_data

        day_of_month
    0    2021-01-01
    1    2021-01-02
    2    2021-01-03
    3    2021-01-04
    4    2021-01-05
    5    2021-01-06
    6    2021-01-07
    .        .
    .        .
    .        .
    30    2021-01-31

month_data = pd.DataFrame({'day_of_month':pd.date_range('01/01/2021','31/01/2021')})

student_df

            student_data      attendence     day_of_month
    0       marcus              present       2021-01-01
    1       marcus              present       2021-01-03
    2       paul                present       2021-01-04

final_df

            student_data      attendence     day_of_month
    0       marcus              present       2021-01-01
    1       marcus              absent        2021-01-02
    2       marcus              present       2021-01-03
    3       marcus              absent        2021-01-04
    4       marcus              absent        2021-01-05
    .         .                   .               .
    .         .                   .               .
    30      marcus              absent        2021-01-31

    31      paul                absent        2021-01-01
    32      paul                absent        2021-01-02
    33      paul                absent        2021-01-03
    34      paul                present       2021-01-04
    .         .                   .               .
    .         .                   .               .
    61      paul                absent        2021-01-31

Advertisement

Answer

You can create a new dataframe containing all dates and the a name for each student on each date:

Input

#df1
  day_of_month
0   2021-01-01
1   2021-01-02
2   2021-01-03
3   2021-01-04
4   2021-01-05
5   2021-01-06
6   2021-01-07

#df2
  student_data attendence day_of_month
0       marcus    present   2021-01-01
1       marcus    present   2021-01-03
2         paul    present   2021-01-04

Code

students = df2['student_data'].unique().tolist()
df = pd.concat([df1.assign(student_data = name) for name in students])
df = df.merge(df2, on=['day_of_month', 'student_data'], how='left')
df['attendence'] = df['attendence'].fillna('absent')

Output

   day_of_month student_data attendence
0    2021-01-01       marcus    present
1    2021-01-02       marcus     absent
2    2021-01-03       marcus    present
3    2021-01-04       marcus     absent
4    2021-01-05       marcus     absent
5    2021-01-06       marcus     absent
6    2021-01-07       marcus     absent
7    2021-01-01         paul     absent
8    2021-01-02         paul     absent
9    2021-01-03         paul     absent
10   2021-01-04         paul    present
11   2021-01-05         paul     absent
12   2021-01-06         paul     absent
13   2021-01-07         paul     absent
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement