k-nearest neighbor with SQL in Radian Studio

I wanted to give you another look at some features that Radian Studio will offer. I’ve shown how we can use SQL to replicate the ARC/INFO NEAR function, and how to perform Nearest Neighbor Analysis. But, another useful took is the ability to identify k-nearest neighbors. That is, rather than just identifying the nearest neighbor, you might want to identify the two, three, or k nearest neighbors.

Radian will allow that functionality by using the COLLECT aggregate clause. The COLLECT aggregate collects values from a subgroup into a table, returning a table with one or more fields.

it is like a SELECT which runs on a group. COLLECT takes a table and returns a table without requiring us to write a FROM section as we would with a SELECT. This is stuff that the real grown up databases like Oracle use, and Manifold is going to give it to us as part of Radian Studio.


SELECT park_no1,
SPLIT(COLLECT park_no2, dist
ORDER BY dist ASC FETCH 3
)
FROM
(
SELECT a.name AS park_no1, b.name AS park_no2,
GeomDistance(a.[geom (i)], b.[geom (i)], 0) AS dist
FROM [parks Table] AS a , [parks Table] AS b
WHERE a.name <> b.name
)
GROUP BY park_no1

Continue reading

ARC/INFO Functions in SQL – Line Density

Dale posted a request to see Line Density recreated in SQL.  Please take a look at the ESRI help topic for line density.

I will warn you, the SQL for this is a total mess – I think I can simplify it, but I only spent about 15 minutes writing the query, and it’s finals week and I am tired :-)

So, I have the code below, but first let me outline the steps I need to follow – we assume a raster called rastersurface, and a vector layer called lines

Image

1.  Take a raster (here we call it rastersurface)

2.  For each pixel, we have to obtain a point (this is where we issue the NewPoint function)

3.  For each point, we have to buffer it (in this case, I am using the buffer function with 1000m)

4.  For each of the buffered points, I am performing a clipintersect with the lines.

5.  The clipintersect of the lines and the buffers are for each base pixel – so, we sum the area of each line, and group it by the unique centerX and centerY or the base pixel.

6.  We wrap that entire thing inside an UPDATE query and update the [Height (I)] feature for each pixel.  To update the correct pixel value, we have to join the table where the centerX and centerY of the pixel equals the centerX and centerY of the points from which the buffer and summation of the areas are computed. Continue reading

ARC/INFO Functions – Symmetrical Difference

I was recently asked by danb to illustrate the ARC/INFO function symmetrical difference. We basically want to find those areas in layer A that don’t intersect layer B, and also find those areas in layer B that don’t intersect layer A.  Its pretty easy to do, as it is created in two parts: subtracting the first layer from the second layer and then subtracting the second layer from the first layer and UNIONING them together.  I’ve added the new layers here so you can test it out.

SELECT * FROM
(
SELECT a.id AS aid, b.id AS bid, clipsubtract(a.id,b.id) AS g
FROM a, b

UNION ALL

SELECT a.id AS aid,b.id AS bid, clipsubtract(b.id,a.id) AS g
FROM a, b
)
RIGHT JOIN B ON bid = b.id
RIGHT JOIN A ON aid = a.id

ARC/INFO Functions – UNION

We are going to conclude our overlay posts with the ARC/INFO UNION command.  This should be a lot easier than it actually is.  I think the problem is that Manifold may have a bug in the ClipSubract command.  What I discovered was that the ClipSubract clause only seems to work on the first geometry in a layer.  With that error, it would appear that we are dead in the water.  However, if Manifold’s ClipSubract only works on the first geometry, well, we can manipulate the query to pass ClipSubract a single geometry.  To do that, we use the UnionAll command to take all the geometries in one layer and union them together into a single geometry.

So, the ARC/INFO UNION command in SQL needs three things:

1.  The intersection of the two layers (to get the intersection of the two layers)
2.  The clipsubtract of layer 1 and layer 2 (to get the part of layer 1 not intersecting layer 2)
3.  The clipsubract of layer 2 and layer 1 (to get the part of layer 2 not intersecting layer 1)

But, we are still not out of the woods yet.  We need to attach the attributes to the layers. To do that, we have to fake out Manifold a little to think that we actually have a unique ID for the the subtract layer.  The ClipSubract clause correctly clips the geometries, but it will not return the ID for the subtract layer since we UNIONEDed the entire thing into a single geometry.  So, to fake Manifold out, we return a value of 0 as the id field for the layer.

SELECT * FROM
(
  SELECT ClipIntersect(circle.id,rectangles.id) AS g, circle.id AS cid, rectangles.id AS rid
  FROM circle, rectangles
UNION ALL
 SELECT Clipsubtract(circle.[Geom (I)],(SELECT UnionAll(id) FROM rectangles)) AS g, circle.id AS cid, 0 AS rid
 FROM circle
UNION ALL
 SELECT ClipSubtract(rectangles.[Geom (I)],(SELECT unionall(id) FROM circle)) AS g, 0 AS cid, rectangles.id AS rid
 FROM rectangles
)
LEFT JOIN [circle] ON circle.id = cid
LEFT JOIN [rectangles] ON rectangles.id = rid

ARC/INFO Functions in SQL – IDENTITY

The ARC/INFO IDENTITY operation seen here is sometimes confusing because it takes ALL of the Input feature, and the part of the IDENTITY feature that intersects the input feature and merges it into a new feature class.  I recreated the features in the ESRI help documentation so that we will work with a feature class called “Circle”, and one called “Rectangles”.

identity

To show how this is done in SQL, we should simply focus on the geometry aspect first, then bring the attributes in later.

SELECT
      ClipIntersect
(circle.id,rectangles.id) g, int(circle.id) AS cid, int(rectangles.id) AS rid
FROM circle, rectangles
UNION ALL
SELECT
ClipSubtract
(rectangles.id,circle.id) AS g, int(circle.id) AS cid, int(rectangles.id) AS rid
FROM circle, rectangles

The IDENTITY operation takes two geometric operations.  First, we have to Intersect the two features to find the intersection area.  Then, we have to perform a ClipSubract to only include those areas of the input feature.  Now, the UNION ALL clause in SQL creates a new table, and sticks it under the previous table – therefore, we must have the same columns in order to do it.  The above SQL will create a geometric representation like this one:

Identityresult

So, this is what it should LOOK like, but the real power is that the attributes from the inputs are retained in the output. To do that, we take our above query, and just wrap it in a RIGHT JOIN clause:

SELECT * FROM
(
SELECT

      ClipIntersect
(circle.id,rectangles.id) g, int(circle.id) AS cid, int(rectangles.id) AS rid
FROM circle, rectangles
UNION ALL
SELECT
ClipSubtract
(rectangles.id,circle.id) AS g, int(circle.id) AS cid, int(rectangles.id) AS rid
FROM circle, rectangles
)
RIGHT JOIN [circle] ON circle.id = cid
RIGHT JOIN [rectangles] ON rectangles.id = rid
WHERE IsArea(g)

Go ahead, and give it a try with the example I have on my website.  Also, don’t forget our idea of spatial is not special.  This IDENTITY command isn’t some self contained function for which you have no control – you can always add other interesting clauses in the WHERE statement to select out certain features first, or some other interesting query.

Note:  A recent discussion on georeference.org illustrated that a lot of null values were returned.  The reason for this is because when we ask to return a ClipIntersect, if two objects don’t intersect, we’ve still asked the query engine to give us that result, and the result is in fact a null value.  So, I’ve added one line of code to only return the geometry if it is an area feature – I’ve written this in RED.  That is all you need, and it will work with more complicated features.

ARC/INFO Functions in SQL – INTERSECT

I promised you that I would address the attributes.  Remember, our previous query only performed the ClipSubract for the geometries, and did not add any additional attribute columns.  There are two ways to add the attributes.  The simplest way is to just call the columns individually or with a *:

SELECT ClipSubtract(square.[Geom (I)],box.[Geom (I)]) AS g, square.*
FROM square, box

That works fine for this example, but when we start merging attributes from multiple tables, things get a little trickier.  So, we can perform a RIGHT-JOIN on the resulting geometric table to stuff the attributes at the end of the table.  For example, to add the attributes, we simply write:

SELECT * FROM
(
SELECT ClipSubtract(square.[Geom (I)],box.[Geom (I)]) AS g,
square.id AS sid
FROM square, box
)
RIGHT JOIN [square] on square.id = sid

Here we are returning the geometric erase, but also the unique ID value and calling that sid.  The RIGHT JOIN clause takes the the attributes from the original [square] table and links it back in to the resulting geometric result in the inner portion of the query.

OK, now lets do the same thing with INTERSECT (here we will use the CIRCLE and RECTANGLE shapes that are like the ESRI help example:

SELECT * FROM
(
SELECT ClipIntersect(a.id,b.id) AS g, int(a.id) cid, int(b.id) AS rid
FROM circle AS a, rectangles AS b
)
RIGHT JOIN [circle] ON circle.id = cid
RIGHT JOIN [rectangles] ON rectangles.id = rid

Since we are dealing with more than one table, I decided to make things a little more generic and renamed the individual geometry tables a and b so to distinguish them from the original [circle] and [rectangles] tables.  Also, since we are dealing with more than one unique ID, I decided to call the [Circle] ID value cid, and the [Rectangles] ID value rid.

ARC/INFO Functions in SQL – ERASE, INTERSECT, IDENTITY, UNION

Today we are beginning another four-part session on the classical ESRI topological overlay functions of Erase, Intersect, Identity, and Union.  If you were an old ARC/INFO user, this is what made you get into GIS in the first place.  In fact, in the 1980s, if someone had a GIS the first thing you asked them was “can you do an INTERSECT?”  If you are not familiar with this, please, please, please visit the ESRI online help to learn about the functions here.

OK, a little name dropping here…..

In 1990 I was in Redlands, CA benchmarking ARC/INFO.  We had a really “difficult” task for them: overlay 25,000 hexagons with 25,000 triangles.  They were running it on a Sun  Sparcstation 1 – so, back then, this was very difficult.  They started the task before lunch, and it was done when we got back – very impressive.  After the two day benchmark was over, we were off to Ottawa to visit with GeoVision to run the benchmark there.  As we were leaving, Jack Dangermond grabbed my arm and said “you make them do the intersection test“.  I curtly nodded, and he pointed his finger at me and reiterated “no, I’m serious, you make them do it“.

Lo and behold, when the time came to run the test, the President of GeoVision acknowledged that based on their timing estimation, it would take around 3 days to complete the task – so they opted not to do it.  Back then, ARC/INFO was the only software that could do a task like that.

Now of course, other software products can do these tests, but I don’t think there is anything as fast as ARC/INFO doing overlays on a COVERAGE.

I have a Manifold .map file located here.  I’ve attempted to create the same shapes as is illustrated in the ESRI user manual.  Following the order in the ESRI manual, lets tackle ERASE first.

I did some heads-up digitizing of the box and the square used in the ESRI help file:

squarebox

The SQL for performing the ERASE is:

SELECT ClipSubtract(square.[Geom (I)],box.[Geom (I)]) AS g
FROM square, box

That’s it!  The result looks like this:

erase

Now, we haven’t done anything with the attributes, and this post has gotten a little long due to my earlier self-aggrandizing statements.  So I’ll tell you what – before we do INTERSECT in the next post, I will address the attributes.

ARC/INFO Functions in SQL – Tabulate Areas

I love the Tabulate Area command in ArcGIS.  The ability to create a cross tabulation of two feature datasets is very cool.  SQL has cross tabulation commands (heck, even Excel does this through a pivot table), and because we are saying that spatial data is just another datatype, we can leverage the cross tabulation queries in SQL to work on, say a spatial relationship like land use values from two different years, instead of looking at sales by store.

Lets assume we have land use values from two different years: 1970 and 2000 (I use this example with my students every year).  We will also assume that there are two fields called LC1970 and LC2000.  An example map looks like the following:

lc

The SQL code to generate this is actually standard use of the TRANSFORM command with a PIVOT command.  This is standard SQL stuff – the only spatial thing is that we are creating a table of the intersection between the 1970 and 2000 land use categories.

TRANSFORM SUM(Area(g))
SELECT lc2000
FROM
   (SELECT ClipIntersect([lu1970].[Geom (I)],[lu2000].[Geom (I)]) AS g,
[lc1970],[LC2000]
   FROM [lu1970],[lu2000]
   )
WHERE lc1970 <> “” and lc2000 <> “”
GROUP BY lc2000
PIVOT lc1970

The ClipIntersect is a spatial SQL construct that performs a geometric intersection of the two layers on-the-fly, in memory.  BTW, this is another benefit of SQL – there are no orphaned files lying around – most of it is done in memory and then released.

The resulting table from the example data set looks like this:

lctable

The ARC/INFO NEAR Function in SQL

Lets take another break from my textbook, and focus on a classic GIS operation.  If you are like me, you learned GIS with ARC/INFO.  I started with ARC/INFO 3.2 on a DEC VAX PDP 11.  Actually, my first ESRI product was AUTOMAP II on a Burroughs mainframe with the CANDE operating system – we even had to use a SYMAP ruler to plot our points for digitizing.

Anyway, for those of you who used ARC/INFO, you probably found that the NEAR function was one of your best friends.  I used it every week to transfer attributes from points to lines, and vice-verse.  This is pretty easy to do in SQL, but I will step through it for you to make it more understandable.  Lets assume we have two vector drawings: Points and Lines.  Obviously, the Points drawing has points in it, and the Lines drawing has lines in it.  The SQL to replicate the NEAR function is:

1. SELECT min(dist), first(lines.id),points.[ID] FROM
2.     (SELECT lines.id, points.id,distance(points.[Geom (I)],lines.[Geom (I)]) AS dist
3.       FROM points, lines
4.       ORDER BY dist)
5. GROUP BY points.id

Continue reading