I have these two initial tables:
Table1:
CustID | StartTime | EndTime | Area |
---|---|---|---|
1 | 12/1/2022 4:00:00 PM | 12/1/2022 4:05:00 PM | ABC |
2 | 12/1/2022 4:02:00 PM | 12/1/2022 4:03:00 PM | ABC |
Table2:
Area | StartTime | EndTime |
---|---|---|
ABC | 12/1/2022 4:01:26 PM | 12/1/2022 4:02:00 PM |
ABC | 12/1/2022 4:02:05 PM | 12/1/2022 4:02:55 PM |
ABC | 12/1/2022 4:04:10 PM | 12/1/2022 4:05:00 PM |
I need to end up with this:
Table3:
CustID | StartTime | EndTime | Area | #ofRecords |
---|---|---|---|---|
1 | 12/1/2022 4:00:00 PM | 12/1/2022 4:05:00 PM | ABC | 3 |
2 | 12/1/2022 4:02:00 PM | 12/1/2022 4:03:00 PM | ABC | 1 |
Would need an efficient way to merge the tables as the volume of rows is huge.
Using the StartTime, EndTime and Area from Table1, to determine how many records of those conditions are in Table2. Then adding a column to show this number of records.
I’ve tried left joining tables but its taking too long.
Advertisement
Answer
This might not be the most efficient way, but is one of the possible ways to achieve your goal.
I defined a function to calculate the number of records #ofRecords
and used apply()
for it, as follows:
import pandas as pd df1 = pd.DataFrame({ 'CustID': [1, 2], 'StartTime': ['12/1/2022 4:00:00 PM', '12/1/2022 4:02:00 PM'], 'EndTime': ['12/1/2022 4:05:00 PM', '12/1/2022 4:03:00 PM'], 'Area': ['ABC', 'ABC'], }) df2 = pd.DataFrame({ 'Area': ['ABC', 'ABC', 'ABC'], 'StartTime': ['12/1/2022 4:01:26 PM', '12/1/2022 4:02:05 PM', '12/1/2022 4:04:10 PM'], 'EndTime': ['12/1/2022 4:02:00 PM', '12/1/2022 4:02:55 PM', '12/1/2022 4:05:00 PM'], }) print(df1) print(df2) """ CustID StartTime EndTime Area 0 1 12/1/2022 4:00:00 PM 12/1/2022 4:05:00 PM ABC 1 2 12/1/2022 4:02:00 PM 12/1/2022 4:03:00 PM ABC Area StartTime EndTime 0 ABC 12/1/2022 4:01:26 PM 12/1/2022 4:02:00 PM 1 ABC 12/1/2022 4:02:05 PM 12/1/2022 4:02:55 PM 2 ABC 12/1/2022 4:04:10 PM 12/1/2022 4:05:00 PM """ # Convert a type of the StartTime and EndTime columns to datetime type df1['StartTime'] = pd.to_datetime(df1['StartTime']) df1['EndTime'] = pd.to_datetime(df1['EndTime']) df2['StartTime'] = pd.to_datetime(df2['StartTime']) df2['EndTime'] = pd.to_datetime(df2['EndTime']) def row_counter(st, et): # Calculate the number of records counter = 0 for i, row in df2.iterrows(): if (st <= row['StartTime']) and (et >= row['EndTime']): counter += 1 return counter df3 = df1.copy() # create a new dataframe by coping df1 df3['#ofRecords'] = df3.apply(lambda x: row_counter(x['StartTime'], x['EndTime']), axis=1) print(df3) """ CustID StartTime EndTime Area #ofRecords 0 1 2022-12-01 16:00:00 2022-12-01 16:05:00 ABC 3 1 2 2022-12-01 16:02:00 2022-12-01 16:03:00 ABC 1 """