Skip to content
Advertisement

Checking segment length in dataframe 1 against multiple segment instances in dataframe 2

Background: I have two Pandas DataFrames:

  • DF1 represents known road segments with >= 7% truck traffic.
  • DF2 represents all road segments in the study area.

Columns: SRI is ‘standard route identifier’, MP_START is ‘mile point start’, MP_END is ‘mile point end’, and TRUCK_PCT is ‘truck traffic percentage’.

Task: For each row in DF1, the task is to check each record in DF2 to find two concurrent items:

  1. Find matching SRI records.
  2. Within the matching SRI records, find the rows in DF2 where DF1.MP_START and DF1.MP_END values are present within the DF2.MP_START and DF2.MP_END values.
  3. Then assign the DF1.TRUCK_PCT value to the correct row in DF2 following the above condition. An example follows below.

Sample DF1:

JavaScript

JavaScript

Sample DF2:

JavaScript

JavaScript

Expected result:

  • For the row where DF1.SRI == 00000009__, DF1.MP_START == 51.30 and DF1.MP_END == 52.260:

    DF1.MP_START on row 0 (51.30) falls within row 5 of DF2 and DF1.MP_END of row 0 (52.26) falls within row 4. Therefore, both DF2 rows 4 and 5 are assigned a TRUCK_PCT value of 7%.

  • Similarly, where DF2.SRI == 00000015__ (row 7) this row would be assigned a TRUCK_PCT of 7% because row 3 of DF1 contains MP_START and MP_END values of DF2 where DF1.SRI == DF2.SRI (00000015__).

Sample DF3 Output:

JavaScript

Disclaimer: This is a confusing problem – please ask for more info where needed and I will try and clarify.

Advertisement

Answer

A possible solution would be to first merge the two dataframes on the SRI column (keeping all rows in DF2) and then afterward find which rows fulfill the secondary condition of overlapping ranges.

First, merge the dataframes while keeping the index of DF2 for later usage:

JavaScript

This results in the following dataframe:

JavaScript

Now, we create a boolean index for the rows that have MP_START or MP_END within the two range columns. Any row that does not fulfill one of these conditions have their TRUCK_PCT value set to nan:

JavaScript

Finally, we can drop the two range columns and group by the previously created index to get the final result. This will preserve any rows in DF2 that do not have a match in DF1:

JavaScript

Result:

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement