The project requires the same selection process with approximately 60 different selection criteria and it operates over 5 different time periods of land use data. An efficient, programmatic approach to the problem is absolutely necessary.

Last night, I finished writing up a function to perform another set of steps in this selection process. As land use changes over time, the function creates 5 different sets of polygons based on the five time periods, to represent the size thresholds the land use data must meet.

For example, I need to only include land use polygons where their contiguous area is greater than 50 acres. Individual polygons are going to be smaller than 50 acres, but should be included if they are part of a contiguous fabric of polygons that exceeds that size.

Before leaving work, I ran the function against some test data and when I arrived this morning I found the query was still running.

It took 7 hours to complete the one step! Applying constraint to patch polygons Applying bitmask to polygons that fail to meet size threshold Patch Size Requirement Constraint Complete. The function that was causing the delay was that final update to the base data.

What was causing it to run so slow? Well, the land use data is a conflation of all five time periods, weighing in at about 2.

I calculated a spatial index on both tables, but clearly that was not enough. Luckily, this data also has a region identifier. The land use data was split up into about 8, regions, each with its own unique region identifier. It was safe to use the region id, as no two polygons with different region ids would touch and be contiguous.

This operation can also be performed using only the spatial index. So I added it as well in the hopes that it would improve the results. Running the function again, the process now only takes about 15 minutes, which is approximately 28 times faster than before.

