On the Manifold GIS discussion board, someone asked about updating a polygon layer with the distance of the farthest pair of vertices for each polygon. I thought that was a fun little conundrum to try out. You can see my Manifold SQL answer here.
That made me think about how to write the same query in PostGIS. It’s actually fairly easy, as the code below shows:
SELECT max(ST_Distance(A.g,B.g)) , A.gid
FROM
(SELECT gid, (ST_Dump(geom)).geom as g
FROM floodzone) AS A,
(SELECT gid, (ST_Dump(geom)).geom as g
FROM floodzone) AS B
WHERE A.gid = B.gid
GROUP BY A.gid
Now, I am doing it for a layer called floodzone. The first thing you have do, as illustrated in lines 3-6 is get the gid and geometry field from the floodzone. We do this twice, because we have to fake Postgres into thinking their are two tables we are selecting from (A and B). The ST_Dump function dumps out all the coordinates for each polygon as a geometry dump. So, the .geom reference turns the geometry dump into actual geometries. So, at the end of lines 3 and 4, you have a table with the gid of the polygon and all the individual point geometries.
Since we have the vertices busted out, we are using line 1 to obtain the maximum distance (ST_Distance) between every vertex. Then, when we issue line 8, we are grouping that maximum distance by the gid. The final result is table that tells us the maximum distance between vertices for each polygon.
Let’s speed things up
On this file, it took 15 seconds. So I got to thinking: how can we make it faster. Well, in basic big-O notation, the algorithmic complexity of what I’m doing above is O(n^2). One way to improve things (especially when you have polygons with massive numbers of vertices), is to reduce the polygon vertices to a convex hull. A convex hull runs in O(nlogn), and when you are done you have much fewer vertices. Then, when you run the distance-to-distance calculation, you are running the O(n^2) function on much fewer points.
The updated query is simply returns the ST_ConvexHull of the polygons:
SELECT max(ST_Distance(A.g,B.g)) , A.gid
FROM
(SELECT gid, (ST_Dump((geom))).geom as g
FROM floodzone) AS A,
(SELECT gid, (ST_Dump((geom))).geom as g
FROM floodzone) AS B
WHERE A.gid = B.gid
GROUP BY A.gid
Running the query this way, the result was achieved in 2 seconds!!
This is why I teach a few lectures on computational geometry in my undergraduate GIS Programming class.
If you want to learn more tricks for using SQL like this, take a look at my Spatial SQL class – hey, for that matter, take a look at all my classes here.
Comments