Skip to content
Advertisement

find nested boxes from huge dataset with geopandas (or other tools)

Basically I have DataFrame with a huge amount of boxes which are defined with xmin ymin xmax ymax tuples.

JavaScript

My task is to remove all nested boxes. (I.e. any box which is within another box has to be removed)

My current method:

  • construct GeoDataFrame with box geometry
  • sort by box size (descending)
  • iteratively find smaller boxes within a larger box.

Sandbox: https://www.kaggle.com/code/easzil/remove-nested-bbox/

JavaScript

Is there any better way to optimize the task to make it faster? The current method is pretty slow to handle large dataset.

I would also consider other methods such as implementing in C or CUDA.

Advertisement

Answer

  • your sample data is not large and has no instances of boxes within boxes. Have generated some randomly
  • have used approach of using loc checking dimensions are bigger
  • not sure if this is faster than your approach, timing details
JavaScript
JavaScript

visuals

enter image description here

full code

JavaScript

approach 2

  • using sample data you provided on kaggle
  • this returns in about half the time (5s) compared to previous version
  • concept is similar, a box is within another box if xmin & ymin are greater than that of another box and max & ymax are less
JavaScript
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement