*This post is part of a series entitled Spatial is not Special, where I will illustrate how spatial constructs in SQL provide us with a rich toolset for geographers doing spatial analysis. To illustrate these concepts, I will be showing examples from my book Statistical Problem Solving in Geography. Even though PostGRES, SQLServer, MySQL, spatialite, and Oracle all support spatial SQL, these examples will be presented using Manifold GIS. The example dataset is a Manifold 8.0 .map file and can be found here.*

Our previous post showed how to extend the mean center of a geographic dataset to incorporate the weighted mean center using SQL. Today’s post examines the SQL code necessary to generate the central feature for a geographic data set. Recall from *Statistical Problem Solving in Geography* (third edition), the formula and computation of the Central Point.

This is where the power of spatial SQL gets used. As you can see, finding the central point requires us to compute a symmetrical matrix of the distance between each point and every other point. For our 7 point example, that’s 49 computations! With spatial SQL, the distance function allows us to determine the distance between each of the points such as:

**SELECT**** distance(a.id,b.id) AS dist
**

**FROM**

**points a, points b**

This SQL statement generates a table of distances between all the points – and we call the resulting column **dist**. However, we need to determine the sum of all the distances for each point, and then select the lowest sum. Because this SQL statement is a little complex, we will number each line and illustrate what is happening:

**SELECT TOP 1 SUM(distance(a.id,b.id)) AS dist, a.id AS aid****FROM points AS a, points AS b****WHERE a.id <> b.id****GROUP BY a.id****ORDER BY dist**

**Line 1** selects the distances between each point and every other point. However, because we have wrapped those distances in the **SUM **function, SQL will compute the sum of all the distances. We have the **TOP **clause in this line, and will get back to its use in a moment.

**Line 3** isn’t really necessary, but just ignores the computation of distances from each point with itself.

**Line 4** is **critical**, because the **SUM** function is an *aggregate* clause, meaning it aggregates the values that are selected – in our case, it sums the values of the distances. But, we don’t want the sum of ALL the distances, we want the sum of the distances aggregated by each of our 7 starting points. Therefore, the **GROUP BY** clause is used to group the sum of the distances for each point ID.

**Line 5** is also necessary because while we have the sum of the distances from each point to every other point, they aren’t in any particular order. So, using the **ORDER BY** clause, we can essentially sort the sum of the distances from low to high.

Remember, at this point we have the sum of all the distances from each point to every other point sorted by the distance. This means that the first record in the result is the point with the lowest sum of the distances to all other points, and the last record is the point with the highest sum of the distances.

We can now revisit Line 1, and the **TOP **clause. If the first record returned (due to our **ORDER BY** clause in Line 5) is the lowest summed distance, then selecting the **TOP **1 from that list obviously returns the feature with the lowest sum of all distances to all the other point.

Art, just purchased your book and looking forward to reading it. It appears in this blog post that your link to the example dataset is broken.

Thanks Ron. And yes, the link isn’t working. I will fix it tonight.

Ron,

The link is fixed You can try the examples now Let me know how it goes