It is nice to get back to the **book**, and start talking about Statistical Problem Solving in Geography again. Today we are going to look at the Nearest Neighbor Index. You can refer to chapter 14 where we illustrate the computation of the nearest neighbor index using a set of 7 points:

Then, for each point we determine the nearest neighbor, its distance, and ultimately the average nearest neighbor distance over all the points:

To develop the SQL, we will take it one step at a time. First, we want to compute the distance from every point to every other point:

**SELECT** a.pt, b.pt,**distance**(a.[Geom (I)],b.[Geom (I)]) **AS** dist

**FROM** points **AS** a, points **AS **b

**WHERE** a.pt <> b.pt

**ORDER BY** dist

This query gives us a table of the distance from every point to every other point. We also play that game again where we rename the table “points” as “a” and “b” so that SQL thinks we have two different tables. We also have to put a WHERE clause in to make sure we aren’t measuring from one point to itself – because the distance will be 0.

This only gets us part of the way there. What we really want is the **nearest neighbor** for each point, not every neighbor. So, to do that, we will wrap the previous query into another query where we get the minimum (**min**) distance for the points in layer “a”. So, to do that, we use the min function which is an aggregate clause, and we also get the first point from “b” which we call “NN” because it is the nearest neighbor to the point in layer “a”. Since we have an aggregate clause, we close it off with the GROUP BY statement. That gives us the identical table you see in the book.

**SELECT min**(dist) **AS** dist, a.pt **AS **PT, first(b.pt) **AS** NN

**FROM**

(

** SELECT** a.pt, b.pt,**distance**(a.[Geom (I)],b.[Geom (I)]) **AS** dist

** FROM** points **AS** a, points **AS **b

** WHERE** a.pt <> b.pt

** ORDER BY** dist

)

**GROUP BY**

But again, we are only 99% of the way there. We don’t really want the distance from every point to its nearest neighbor (which the table above gives us), we want the average nearest neighbor distance. So, we just wrap the previous query in an AVG function.

**SELECT** **AVG**(dist) **AS** NNI

**FROM**

(

** SELECT min**(dist) **AS** dist, a.pt **AS **PT, first(b.pt) **AS** NN

** FROM**

(

** SELECT** a.pt, b.pt,**distance**(a.[Geom (I)],b.[Geom (I)]) **AS** dist

** FROM** points **AS** a, points **AS **b

** WHERE** a.pt <> b.pt

** ORDER BY** dist

)

**GROUP BY** a.pt

)

Hi Art

I seem to be struggling with understanding of where I am going wrong:

” …SELECT a.pt, b.pt,distance(a.[Geom (I)],b.[Geom (I)]) AS dist

FROM points AS a, points AS b

WHERE a.pt b.pt

ORDER BY dist…”

When I run in Manifold I get the error ‘Unknown Column a.pt’

Why do you write Select a.pt… when you create ‘From points AS a ‘?

Any guidance would be greatly appreciated.

Cheers

Neill

Sorry – I should add that I have ordered the book to follow from, but not yet arrived here in NZ. Will the answer be in it?

N

I just ran it on the “bookExamples.map” file, and it ran fine. Are you using that example? Make sure you have a vector layer called “points”. Also, make sure it has a column called “PTS” to indicate the unique point.

Now, if you don’t have a column named PTS, you can always choose another column that would indicate a unique value.

Also, the book will only go over the theory, the mathematics, and the utility within a geographic context. It won’t go over Manifold.

Thanks artlembo

I have not been using the sample file but one of my own called points.

I do not have a column called pts. Makes sense now that I think about it.

Cheers

Actually, this is a good thing. It gives you an opportunity to make some changes. That will actually help you to better understand what the code is doing.

Pingback: k-nearest neighbor with SQL in Radian Studio | gisadvising