Spatial is Not Special – Quadrat Analysis

In our book we illustrated the use of quadrat analysis for determining whether points were random, clustered, or distributed.  Figure 14.9 from the book showed a point sample of 2,500 points, and Table 14.4 showed the mathematical calculation for quadrat analysis.

Image

 

 

Image

The calculations look pretty daunting, don’t they?  But, in actuality, its basic arithmetic.  In this blog I am only going to illustrate how we obtained the correct variance to mean ratio using spatial SQL.  If you want to understand quadrat analysis, check out the book, or do a web search. Continue reading

Spatial is Not Special – Nearest Neighbor Index

 

It is nice to get back to the book, and start talking about Statistical Problem Solving in Geography again.  Today we are going to look at the Nearest Neighbor Index.  You can refer to chapter 14 where we illustrate the computation of the nearest neighbor index using a set of 7 points:

 nnfig

Then, for each point we determine the nearest neighbor, its distance, and ultimately the average nearest neighbor distance over all the points:

 nncalc

To develop the SQL, we will take it one step at a time.  First, we want to compute the distance from every point to every other point:

SELECT a.pt, b.pt,distance(a.[Geom (I)],b.[Geom (I)]) AS dist
FROM points AS a, points AS b
WHERE a.pt <> b.pt
ORDER BY dist

This query gives us a table of the distance from every point to every other point.  We also play that game again where we rename the table “points” as “a” and “b” so that SQL thinks we have two different tables.  We also have to put a WHERE clause in to make sure we aren’t measuring from one point to itself – because the distance will be 0. Continue reading

ARC/INFO Functions – Symmetrical Difference

I was recently asked by danb to illustrate the ARC/INFO function symmetrical difference. We basically want to find those areas in layer A that don’t intersect layer B, and also find those areas in layer B that don’t intersect layer A.  Its pretty easy to do, as it is created in two parts: subtracting the first layer from the second layer and then subtracting the second layer from the first layer and UNIONING them together.  I’ve added the new layers here so you can test it out.

SELECT * FROM
(
SELECT a.id AS aid, b.id AS bid, clipsubtract(a.id,b.id) AS g
FROM a, b

UNION ALL

SELECT a.id AS aid,b.id AS bid, clipsubtract(b.id,a.id) AS g
FROM a, b
)
RIGHT JOIN B ON bid = b.id
RIGHT JOIN A ON aid = a.id

ARC/INFO Functions – UNION

We are going to conclude our overlay posts with the ARC/INFO UNION command.  This should be a lot easier than it actually is.  I think the problem is that Manifold may have a bug in the ClipSubract command.  What I discovered was that the ClipSubract clause only seems to work on the first geometry in a layer.  With that error, it would appear that we are dead in the water.  However, if Manifold’s ClipSubract only works on the first geometry, well, we can manipulate the query to pass ClipSubract a single geometry.  To do that, we use the UnionAll command to take all the geometries in one layer and union them together into a single geometry.

So, the ARC/INFO UNION command in SQL needs three things:

1.  The intersection of the two layers (to get the intersection of the two layers)
2.  The clipsubtract of layer 1 and layer 2 (to get the part of layer 1 not intersecting layer 2)
3.  The clipsubract of layer 2 and layer 1 (to get the part of layer 2 not intersecting layer 1)

But, we are still not out of the woods yet.  We need to attach the attributes to the layers. To do that, we have to fake out Manifold a little to think that we actually have a unique ID for the the subtract layer.  The ClipSubract clause correctly clips the geometries, but it will not return the ID for the subtract layer since we UNIONEDed the entire thing into a single geometry.  So, to fake Manifold out, we return a value of 0 as the id field for the layer.

SELECT * FROM
(
  SELECT ClipIntersect(circle.id,rectangles.id) AS g, circle.id AS cid, rectangles.id AS rid
  FROM circle, rectangles
UNION ALL
 SELECT Clipsubtract(circle.[Geom (I)],(SELECT UnionAll(id) FROM rectangles)) AS g, circle.id AS cid, 0 AS rid
 FROM circle
UNION ALL
 SELECT ClipSubtract(rectangles.[Geom (I)],(SELECT unionall(id) FROM circle)) AS g, 0 AS cid, rectangles.id AS rid
 FROM rectangles
)
LEFT JOIN [circle] ON circle.id = cid
LEFT JOIN [rectangles] ON rectangles.id = rid

ARC/INFO Functions in SQL – IDENTITY

The ARC/INFO IDENTITY operation seen here is sometimes confusing because it takes ALL of the Input feature, and the part of the IDENTITY feature that intersects the input feature and merges it into a new feature class.  I recreated the features in the ESRI help documentation so that we will work with a feature class called “Circle”, and one called “Rectangles”.

identity

To show how this is done in SQL, we should simply focus on the geometry aspect first, then bring the attributes in later.

SELECT
      ClipIntersect
(circle.id,rectangles.id) g, int(circle.id) AS cid, int(rectangles.id) AS rid
FROM circle, rectangles
UNION ALL
SELECT
ClipSubtract
(rectangles.id,circle.id) AS g, int(circle.id) AS cid, int(rectangles.id) AS rid
FROM circle, rectangles

The IDENTITY operation takes two geometric operations.  First, we have to Intersect the two features to find the intersection area.  Then, we have to perform a ClipSubract to only include those areas of the input feature.  Now, the UNION ALL clause in SQL creates a new table, and sticks it under the previous table – therefore, we must have the same columns in order to do it.  The above SQL will create a geometric representation like this one:

Identityresult

So, this is what it should LOOK like, but the real power is that the attributes from the inputs are retained in the output. To do that, we take our above query, and just wrap it in a RIGHT JOIN clause:

SELECT * FROM
(
SELECT

      ClipIntersect
(circle.id,rectangles.id) g, int(circle.id) AS cid, int(rectangles.id) AS rid
FROM circle, rectangles
UNION ALL
SELECT
ClipSubtract
(rectangles.id,circle.id) AS g, int(circle.id) AS cid, int(rectangles.id) AS rid
FROM circle, rectangles
)
RIGHT JOIN [circle] ON circle.id = cid
RIGHT JOIN [rectangles] ON rectangles.id = rid
WHERE IsArea(g)

Go ahead, and give it a try with the example I have on my website.  Also, don’t forget our idea of spatial is not special.  This IDENTITY command isn’t some self contained function for which you have no control – you can always add other interesting clauses in the WHERE statement to select out certain features first, or some other interesting query.

Note:  A recent discussion on georeference.org illustrated that a lot of null values were returned.  The reason for this is because when we ask to return a ClipIntersect, if two objects don’t intersect, we’ve still asked the query engine to give us that result, and the result is in fact a null value.  So, I’ve added one line of code to only return the geometry if it is an area feature – I’ve written this in RED.  That is all you need, and it will work with more complicated features.

Spatial is Not Special – Modifiable Area Unit Problem (MAUP)

I wrote a paper in 2006 about using SQL for the Modifiable Area Unit Problem (MAUP).  The example was an agricultural experiment field, and I showed how SQL could be used to calculate the descriptive statistics rather easily, and we also used SQL to develp a trend surface map to better understand the underlying trends of the data and determine the best areal units to use for grouping the data.  So, we thought we would use that example in the book.  MAUP is a conundrum that geographers have been thinking about for a long time, and quite frankly we will think about in the future.  I won’t go into what MAUP is here, for that you can buy the book!  But, one of our figures showed the data for 3 different blocking patterns and the corresponding descriptive statistics:

maup

The key is to develop descriptive statistics for each point that is located within each block.  This is easily accomplished through the CONTAINS and the GROUP BY clauses as follows:

SELECT avg(AVG_P) AS AVG_P, avg(AVG_K) AS AVG_K, var(AVG_K) AS VAR_K, var(AVG_P) AS VAR_P
FROM
(
   SELECT grid.id, avg(p) AVG_P, avg(k) AVG_K
   FROM grid, samples
   WHERE contains(grid.id,samples.id)
  GROUP BY grid.id
)

The subquery within the parenthesis computes the average Phosphorous (P) and Potassium (K) values for the soils, and groups it by the grid the points are contained in.  Based on our figure above, that means we will have 6 values for K and P.  If you wanted to leave it at that, you could put those values into a scatter plot to view the correlation, or wrap or copy the data to Excel and compute the correlation coefficient.  But, for us, we will take the 6 results from the sub query and compute the averages and variances of P and K for the overall grouped data.

The ARC/INFO NEAR Function in SQL

Lets take another break from my textbook, and focus on a classic GIS operation.  If you are like me, you learned GIS with ARC/INFO.  I started with ARC/INFO 3.2 on a DEC VAX PDP 11.  Actually, my first ESRI product was AUTOMAP II on a Burroughs mainframe with the CANDE operating system – we even had to use a SYMAP ruler to plot our points for digitizing.

Anyway, for those of you who used ARC/INFO, you probably found that the NEAR function was one of your best friends.  I used it every week to transfer attributes from points to lines, and vice-verse.  This is pretty easy to do in SQL, but I will step through it for you to make it more understandable.  Lets assume we have two vector drawings: Points and Lines.  Obviously, the Points drawing has points in it, and the Lines drawing has lines in it.  The SQL to replicate the NEAR function is:

1. SELECT min(dist), first(lines.id),points.[ID] FROM
2.     (SELECT lines.id, points.id,distance(points.[Geom (I)],lines.[Geom (I)]) AS dist
3.       FROM points, lines
4.       ORDER BY dist)
5. GROUP BY points.id

Continue reading

Stream Sinuosity with Spatial SQL

In case you are getting tired of reading examples from my book (did I mention that I have a textbook you can buy from Amazon?), I thought I would throw a post in here and there that illustrates other aspects of SQL and GIS.  A friend recently sent me a file of stream segments, and wanted to determine the sinuosity of each stream segment.  You may recall the sinuosity formula as the length of the segment / distance between the start and endpoint of a line.  This is essentially a ratio of the theoretical straight line distance between the endpoints and the actual distance along the segment.

 Lets suppose we have a vector file named Stream_Sample. Sinuosity is easily calculated with SQL and a few spatial constructs as:

SELECT Length([Geom (I)]) / Distance(StartPoint([Geom (I)]),EndPoint([Geom (I)])) AS ss
FROM [Stream_Sample]

Continue reading

Spatial is not special – Weighted Standard Distance

This post is part of a series entitled Spatial is not Special, where I will illustrate how spatial constructs in SQL provide us with a rich toolset for geographers doing spatial analysis.  To illustrate these concepts, I will be showing examples from my book Statistical Problem Solving in Geography.  Even though PostGRES, SQLServer, MySQL, spatialite, and Oracle all support spatial SQL, these examples will be presented using Manifold GIS.  The example dataset is a Manifold 8.0 .map file and can be found here.

As we continue to move through my book Statistical Problem Solving in Geography, we extend the computation of the standard distance to that of the weighted standard distance. Following our example from the book, we have an attribute column called f, which represents the frequency.  The computation is presented as:

table4_6

The SQL for computing the weighted standard distance is:

SELECT sqr(
                       SUM(f*x^2)/(SELECT sum(f)
                                            FROM Points)(SUM(x*f)/(SELECT sum(f)
                                                                                 FROM Points))^2
+ SUM(f*y^2)/(SELECT SUM(f)
                                         FROM Points)(SUM(y*f)/(SELECT sum(f)
                                                                                       FROM Points))^2
                           )
FROM Points

This is almost identical to the standard distance computation, but instead of dividing by the number of features, we divide by the total frequency. We also subtract the weighted mean center coordinates, instead of simply the mean center.  And finally, we multiply each coordinate by the frequency. Once again, the proper placement of parentheses allow us to wrap that calculation inside the square root function. I’ve colored the parentheses to help understand which ones correspond to one another – hopefully the colors are more helpful than distracting.  This is what is so nice about having a worked example – putting this into SQL took me a little time because of the parenthesis issue.

So, this gets us the weighted standard distance. LIke the previous example of weighted mean center, we can wrap the query in the buffer function to create a geometry of the weighted standard distance centered on the weighted mean center.

SELECT buffer(newpoint((SUM(x*f)/(SELECT sum(f)
FROM Points))^2,
(SUM(y*f)/(SELECT sum(f)
FROM Points))^2),
                                          sqr
                                             (SUM(f*x^2)/(SELECT sum(f)
                                             FROM Points) – (SUM(x*f)/(SELECT sum(f)
                                                                                          FROM Points))^2
                                             + SUM(f*y^2)/(SELECT SUM(f)
                                                                     FROM Points) – (SUM(y*f)/(SELECT sum(f)
                                                                                                  FROM Points))^2))
        FROM Points

Remember from earlier posts that the buffer function requires a geometry and a distance [buffer([geometry],distance)]. So, for the [geometry] field, we are passing our mean center, and for our distance, we are passing our standard distance. Again, the hardest thing about this is getting the parentheses correct!!!  Always work with a small example so you can make sure your answers are correct – this will scale just fine once you have it working.

Hopefully, with the gray text, you can see how easy it is to take the results from the standard distance query along with the mean center geometry, and just insert it into the buffer function.

Spatial is Not Special – Standard Distance

This post is part of a series entitled Spatial is not Special, where I will illustrate how spatial constructs in SQL provide us with a rich toolset for geographers doing spatial analysis.  To illustrate these concepts, I will be showing examples from my book Statistical Problem Solving in Geography.  Even though PostGRES, SQLServer, MySQL, spatialite, and Oracle all support spatial SQL, these examples will be presented using Manifold GIS.  The example dataset is a Manifold 8.0 .map file and can be found here.

As we continue to move through my book  Statistical Problem Solving in Geography, we come to another important descriptive spatial statistic called the standard distance.  Following our example from the book, we see the computation presented as:

fig4_5

Continue reading