I am trying to compare or merge two different data sets and I am using pandas for that.
The challenge that I am facing is that data is spread across rows in the first data set (Data1) and the other data set (Data2) has the same data spread across columns, below are the screenshots.
- Screenshot 1st – This is Data1
- Screenshot 2nd – This is Data2
Also, I have attached the same Excel workbook here for your reference.
What I am trying to do is convert one of them to another format to match the dataset and perform the merge.
Note: Transpose is not helping me, since I need to do it for each department and transpose does put everything either in rows or columns including department, whereas I only want to transpose weekly data.
What is the best way to achieve this in Python?
Advertisement
Answer
One option to transform the second dataframe is with pivot_longer from pyjanitor:
# pip install pyjanitor import pandas as pd import janitor df = pd.read_excel('Test_Data_Set.xlsx', sheet_name=None) df1 = df['Data1'] df2 = df['Data2'] df3 = df2.pivot_longer(index = ['code', 'name'], names_to = 'day_of_week', names_pattern=r'(.+)s.+') df1.merge(df3, on =['code', 'name', 'day_of_week']) code name day_of_week start_time end_time value 0 test2 Test_Department2 Monday 900 1900 08:00 - 20:00 1 test2 Test_Department2 Tuesday 900 1900 08:00 - 20:00 2 test2 Test_Department2 Wednesday 900 1900 08:00 - 20:00 3 test2 Test_Department2 Thursday 900 1900 08:00 - 20:00 4 test2 Test_Department2 Friday 900 1900 10:00 - 19:00 5 test2 Test_Department2 Saturday 900 1900 10:00 - 19:00 6 test2 Test_Department2 Sunday 900 1900 12:00 - 17:00