I am trying to compare or merge two different data sets and I am using pandas for that.
The challenge that I am facing is that data is spread across rows in the first data set (Data1) and the other data set (Data2) has the same data spread across columns, below are the screenshots.
- Screenshot 1st – This is Data1
- Screenshot 2nd – This is Data2
Also, I have attached the same Excel workbook here for your reference.
What I am trying to do is convert one of them to another format to match the dataset and perform the merge.
Note: Transpose is not helping me, since I need to do it for each department and transpose does put everything either in rows or columns including department, whereas I only want to transpose weekly data.
What is the best way to achieve this in Python?
Advertisement
Answer
One option to transform the second dataframe is with pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
df = pd.read_excel('Test_Data_Set.xlsx', sheet_name=None)
df1 = df['Data1']
df2 = df['Data2']
df3 = df2.pivot_longer(index = ['code', 'name'], names_to = 'day_of_week', names_pattern=r'(.+)s.+')
df1.merge(df3, on =['code', 'name', 'day_of_week']) 
    code              name day_of_week  start_time  end_time          value
0  test2  Test_Department2      Monday         900      1900  08:00 - 20:00
1  test2  Test_Department2     Tuesday         900      1900  08:00 - 20:00
2  test2  Test_Department2   Wednesday         900      1900  08:00 - 20:00
3  test2  Test_Department2    Thursday         900      1900  08:00 - 20:00
4  test2  Test_Department2      Friday         900      1900  10:00 - 19:00
5  test2  Test_Department2    Saturday         900      1900  10:00 - 19:00
6  test2  Test_Department2      Sunday         900      1900  12:00 - 17:00
 
						 
						
