Spatial is Not Special – Spatial interaction

In my book Introduction to Statistical Problem Solving in Geography, we talk about relationships among geographic objects. These relationships are often used in a spatial adjacency matrix.  In the book we refer to the age-old question of who is my neighbor, to better understand what signifies a neighbor.  Well, neighbors could be next to each other (adjacent), they could be within a specific distance (near), or they could be contained within one another.  Also, once we have a neighbor we need to determine what kind of neighbor it is – is it a strong neighbor or a weak neighbor?  That is known as a spatial interaction model.  The next three posts are going to look at forming a spatial interaction model for distance by examining distance matrices.  If you are going to be a quantitative geographer, you are going to have to learn about distance matrices.  I am going to use a Manifold .map file located here.

Lets assume that we have a table of 6 cities called Cities, with a geometry column called ‘g’.

sixcities

The basic distance computation for a set of cities is:

SELECT int(DistanceEarth([Cities].g, [Cities2].g,”mi”)) AS [Miles],
[Cities].name, [Cities2].name
FROM [Cities], [Cities] AS [Cities2]

The DistanceEarth function is a spatial function that computes the ellipsoidal distance between two points, and in our example we are returning the distance as miles.  Because we want to compute the distance from each city to every other city, we select the data from the same table twice, but we call one of the tables [Cities2] so that SQL thinks we are working with a different table.  The result of this query is a table that has “from” city and the “to” city along with the distance in miles.

cityqry

If you are paying attention, you would say “wait, that isn’t a matrix”.  That’s right. To create a matrix, we have to turn the resulting table into a PIVOT table to present the data in a matrix form (we did this in an earlier post):

TRANSFORM Min(int(DistanceEarth([Cities].g, [Cities2].g, “mi”)))
SELECT [Cities].name
FROM [Cities], [Cities] AS [Cities2]
GROUP BY [Cities].[Name]
PIVOT [Cities2].[Name]

In this case, we grab the Min Distance for each pair of cities, and GROUP it by the City name.  This way, we have the minimum distance for each pair of cities.  The PIVOT function will pivot the table so that we create an n x n symmetrical matrix of the distances.  The result is a table that looks like this:

crossmatrix

In our next post, we will look at an inverse and inverse distance squared matrix.  Our final post will create an adjacency matrix both for adjacent geographic objects, and those within a specified lag distance.

Spatial is Not Special – Variograms with SQL

Every geographer knows Tobler’s Law.  Near things are more similar to things that are far away.  From this, we often begin our discussions of spatial autocorrelation.  I’m going to actually show you how to do join count and Moran’s I in SQL, but today I want to show how we can create a variogram using SQL.  But, before you get too excited, I want to present a different take on the variogram, and that is the perspective of the geographer and not the geostatistician.

Here is the problem, the variogram is a really cool tool to assess the spatial relationship that features have with one another.  But, usually only reserve the variogram for higher level geostatistics and most geographers run away screaming from it.  What if instead we simply used the variogram as a descriptive measure for how a geographic feature correlations over space.

In my book Introduction to Statistical Problem Solving in Geography, we introduce the variogram in this simplified descriptive way.  To do that, we illustrate the idea with the last spring frost dates in the Southeastern United States:

Image

We show the students how a simple variogram illustrates how the last spring frost dates are spatially correlated from a descriptive standpoint.  In our dataset, one can see that the data is spatially correlated up to around 400 miles, but after that, it becomes random.

Image

Surprisingly (although if you have been reading this blog, not surprisingly), it is relatively easy to generate a variogram in SQL.  The code is (assuming we have a set of point in a table called ‘p’ with an attribute field [lsf]):

SELECT dist,avg(abs(diff))/2 AS semivariance
FROM
   (SELECT Floor(DistanceEarth(P.id,P3.id,”mi”)/50) AS dist,
(p.[lsf] – p3.[lsf]) AS diff
   FROM p, p AS p3)
GROUP BY dist

Once again, we will dive into the sub-query.  We want to compare every point with every other point, so we need two tables.  Issuing the  FROM p, p AS p3  portion will treat the table [p] like it is another table called [p3] – sneaky.  We are also computing for each point the distance between itself and every other point in the file in miles.  In addition, we are looking at the differences between the last spring frost date and called that result diff.

As for the distances, we are determining the distance between each point, and dividing it by 50 and also using the Floor function.  This is actually going to truncate the points at 50 miles (this is how we trick SQL into giving us lag distance.

Now, all of this is finally wrapped into the aggregate clause that shows the differences in distance grouped by the lag (multiples of 50) and their average differences.  The query returns two columns which we use as a scatterplot to show our vairogram.

Spatial is Not Special – Modifiable Area Unit Problem (MAUP)

I wrote a paper in 2006 about using SQL for the Modifiable Area Unit Problem (MAUP).  The example was an agricultural experiment field, and I showed how SQL could be used to calculate the descriptive statistics rather easily, and we also used SQL to develp a trend surface map to better understand the underlying trends of the data and determine the best areal units to use for grouping the data.  So, we thought we would use that example in the book.  MAUP is a conundrum that geographers have been thinking about for a long time, and quite frankly we will think about in the future.  I won’t go into what MAUP is here, for that you can buy the book!  But, one of our figures showed the data for 3 different blocking patterns and the corresponding descriptive statistics:

maup

The key is to develop descriptive statistics for each point that is located within each block.  This is easily accomplished through the CONTAINS and the GROUP BY clauses as follows:

SELECT avg(AVG_P) AS AVG_P, avg(AVG_K) AS AVG_K, var(AVG_K) AS VAR_K, var(AVG_P) AS VAR_P
FROM
(
   SELECT grid.id, avg(p) AVG_P, avg(k) AVG_K
   FROM grid, samples
   WHERE contains(grid.id,samples.id)
  GROUP BY grid.id
)

The subquery within the parenthesis computes the average Phosphorous (P) and Potassium (K) values for the soils, and groups it by the grid the points are contained in.  Based on our figure above, that means we will have 6 values for K and P.  If you wanted to leave it at that, you could put those values into a scatter plot to view the correlation, or wrap or copy the data to Excel and compute the correlation coefficient.  But, for us, we will take the 6 results from the sub query and compute the averages and variances of P and K for the overall grouped data.

Spatial is not special – Weighted Standard Distance

This post is part of a series entitled Spatial is not Special, where I will illustrate how spatial constructs in SQL provide us with a rich toolset for geographers doing spatial analysis.  To illustrate these concepts, I will be showing examples from my book Statistical Problem Solving in Geography.  Even though PostGRES, SQLServer, MySQL, spatialite, and Oracle all support spatial SQL, these examples will be presented using Manifold GIS.  The example dataset is a Manifold 8.0 .map file and can be found here.

As we continue to move through my book Statistical Problem Solving in Geography, we extend the computation of the standard distance to that of the weighted standard distance. Following our example from the book, we have an attribute column called f, which represents the frequency.  The computation is presented as:

table4_6

The SQL for computing the weighted standard distance is:

SELECT sqr(
                       SUM(f*x^2)/(SELECT sum(f)
                                            FROM Points)(SUM(x*f)/(SELECT sum(f)
                                                                                 FROM Points))^2
+ SUM(f*y^2)/(SELECT SUM(f)
                                         FROM Points)(SUM(y*f)/(SELECT sum(f)
                                                                                       FROM Points))^2
                           )
FROM Points

This is almost identical to the standard distance computation, but instead of dividing by the number of features, we divide by the total frequency. We also subtract the weighted mean center coordinates, instead of simply the mean center.  And finally, we multiply each coordinate by the frequency. Once again, the proper placement of parentheses allow us to wrap that calculation inside the square root function. I’ve colored the parentheses to help understand which ones correspond to one another – hopefully the colors are more helpful than distracting.  This is what is so nice about having a worked example – putting this into SQL took me a little time because of the parenthesis issue.

So, this gets us the weighted standard distance. LIke the previous example of weighted mean center, we can wrap the query in the buffer function to create a geometry of the weighted standard distance centered on the weighted mean center.

SELECT buffer(newpoint((SUM(x*f)/(SELECT sum(f)
FROM Points))^2,
(SUM(y*f)/(SELECT sum(f)
FROM Points))^2),
                                          sqr
                                             (SUM(f*x^2)/(SELECT sum(f)
                                             FROM Points) – (SUM(x*f)/(SELECT sum(f)
                                                                                          FROM Points))^2
                                             + SUM(f*y^2)/(SELECT SUM(f)
                                                                     FROM Points) – (SUM(y*f)/(SELECT sum(f)
                                                                                                  FROM Points))^2))
        FROM Points

Remember from earlier posts that the buffer function requires a geometry and a distance [buffer([geometry],distance)]. So, for the [geometry] field, we are passing our mean center, and for our distance, we are passing our standard distance. Again, the hardest thing about this is getting the parentheses correct!!!  Always work with a small example so you can make sure your answers are correct – this will scale just fine once you have it working.

Hopefully, with the gray text, you can see how easy it is to take the results from the standard distance query along with the mean center geometry, and just insert it into the buffer function.

Spatial is Not Special – Standard Distance

This post is part of a series entitled Spatial is not Special, where I will illustrate how spatial constructs in SQL provide us with a rich toolset for geographers doing spatial analysis.  To illustrate these concepts, I will be showing examples from my book Statistical Problem Solving in Geography.  Even though PostGRES, SQLServer, MySQL, spatialite, and Oracle all support spatial SQL, these examples will be presented using Manifold GIS.  The example dataset is a Manifold 8.0 .map file and can be found here.

As we continue to move through my book  Statistical Problem Solving in Geography, we come to another important descriptive spatial statistic called the standard distance.  Following our example from the book, we see the computation presented as:

fig4_5

Continue reading

Spatial is Not Special – Central Feature

This post is part of a series entitled Spatial is not Special, where I will illustrate how spatial constructs in SQL provide us with a rich toolset for geographers doing spatial analysis.  To illustrate these concepts, I will be showing examples from my book Statistical Problem Solving in Geography.  Even though PostGRES, SQLServer, MySQL, spatialite, and Oracle all support spatial SQL, these examples will be presented using Manifold GIS.  The example dataset is a Manifold 8.0 .map file and can be found here.

Our previous post showed how to extend the mean center of a geographic dataset to incorporate the weighted mean center using SQL.  Today’s post examines the SQL code necessary to generate the central feature for a geographic data set.  Recall from Statistical Problem Solving in Geography (third edition), the formula and computation of the Central Point. Continue reading

Spatial is Not Special – Weighted Mean Center

This post is part of a series entitled Spatial is not Special, where I will illustrate how spatial constructs in SQL provide us with a rich toolset for geographers doing spatial analysis.  To illustrate these concepts, I will be showing examples from my bookStatistical Problem Solving in Geography.  Even though PostGRES, SQLServer, MySQL, spatialite, and Oracle all support spatial SQL, these examples will be presented using Manifold GIS.  The example dataset is a Manifold 8.0 .map file and can be found here.

In our previous post we saw how easy it was to compute the mean center of a geographic dataset with SQL.  Today’s post examines the SQL code necessary to generate the weighted mean center for a geographic data set.  Recall from Statistical Problem Solving in Geography (third edition),  the formula and computation of weighted mean center and the 7 point data set used. Continue reading

Spatial is not Special – Mean Center

This post is part of a series entitled Spatial is not Special, where I will illustrate how spatial constructs in SQL provide us with a rich toolset for geographers doing spatial analysis.  To illustrate these concepts, I will be showing examples from my book Statistical Problem Solving in Geography.  Even though PostGRES, SQLServer, MySQL, spatialite, and Oracle all support spatial SQL, these examples will be presented using Manifold GIS.  The example dataset is a Manifold 8.0 .map file and can be found here.

Today’s post examines the SQL code necessary to generate the mean center for a geographic data set.  Recall from Statistical Problem Solving in Geography (third edition),  the formula for mean center and the 7 point data set used. Continue reading