Trade Area Analysis in Postgres / PostGIS

How I used SQL to generate an accumulated sum of spatial data

My friend Tomas does work with business analytics, and wanted to find a way to perform trade area analysis.  He had a bunch of stores, and a map of census areas with populations.  What he wanted to figure out was:

how far do I have to radiate out from each store before I hit 5,000 customers

So for each store, he wanted to basically generate concentric buffers and sum up the population of the census areas before he hit 5,000.  Once he found the census areas, he also wanted to generate a convex hull to show the area.  ESRI has a really nice tool that performs this as part of their business analyst package, called threshold trade areas.

Check it out, it seems pretty cool.

Well, to help my friend I was thinking that I would determine the distance from every store to every census area, and then write a script with a giant for loop to iterate through each record, accumulating the results (of course, I would have to integrate some kind of do..while loop and an if…then to make sure I accumulated different counts for each store until I hit the threshold I wanted.  At that point I began asking myself so, how good of friend is Tomas?

What I did instead was write an SQL script to do it. I’ve color coded it below to explain what I was doing….

SELECT ST_ConvexHull(ST_Collect(g)) as geometry, max(sumpop) as sumpop, name
INTO tradearea
FROM
 (
  SELECT a.name, SUM(a.totpop) AS sumpop, ST_Collect(a.geometry) as g
  FROM 
     (SELECT stores.name, censusunit.totpop, censusunit.geometry,
      ST_Distance(censusunit.geometry,stores.geometry) as dist
      FROM stores, censusunit
      ) AS a,
     (SELECT stores.name, censusunit.totpop, censusunit.geometry,
      ST_Distance(censusunit.geometry,stores.geometry) as dist
      FROM stores, censusunit
      ) AS b
  WHERE a.name = b.name
  AND a.dist <= b.dist
  GROUP BY a.name, b.dist
  ) AS T1
WHERE sumpop < 5000
GROUP BY name

The middle portion in orange collects the names of the stores, the population of the census areas, the distance between each store and each census area, and the geometry of the census areas for each combination of stores and census areas.  So, if you have 5 stores and 1,000 census areas, you would have a table with 5,000 rows: Continue reading

New Books – How do I do that in PostGIS, How do I do that in Manifold SQL

I have two new books out – How do I do that in PostGISand How do I do that in Manifold SQL.  

From the back cover of How do I do that in PostGIS:

For those who are unsure if SQL is a sufficient language for performing GIS tasks, this book is for you. This guide follows the topic headings from the book How do I do that in ArcGIS/Manifold, as a way to illustrate the capabilities of the PostGIS SQL engine for accomplishing classic GIS tasks. With this book as a resource, users will be able to perform many classic GIS functions using nothing but SQL.

Continue reading

Spatial is Not Special – Quadrat Analysis

In our book we illustrated the use of quadrat analysis for determining whether points were random, clustered, or distributed.  Figure 14.9 from the book showed a point sample of 2,500 points, and Table 14.4 showed the mathematical calculation for quadrat analysis.

Image

 

 

Image

The calculations look pretty daunting, don’t they?  But, in actuality, its basic arithmetic.  In this blog I am only going to illustrate how we obtained the correct variance to mean ratio using spatial SQL.  If you want to understand quadrat analysis, check out the book, or do a web search. Continue reading

Spatial is Not Special – Nearest Neighbor Index

 

It is nice to get back to the book, and start talking about Statistical Problem Solving in Geography again.  Today we are going to look at the Nearest Neighbor Index.  You can refer to chapter 14 where we illustrate the computation of the nearest neighbor index using a set of 7 points:

 nnfig

Then, for each point we determine the nearest neighbor, its distance, and ultimately the average nearest neighbor distance over all the points:

 nncalc

To develop the SQL, we will take it one step at a time.  First, we want to compute the distance from every point to every other point:

SELECT a.pt, b.pt,distance(a.[Geom (I)],b.[Geom (I)]) AS dist
FROM points AS a, points AS b
WHERE a.pt <> b.pt
ORDER BY dist

This query gives us a table of the distance from every point to every other point.  We also play that game again where we rename the table “points” as “a” and “b” so that SQL thinks we have two different tables.  We also have to put a WHERE clause in to make sure we aren’t measuring from one point to itself – because the distance will be 0. Continue reading

ARC/INFO Functions in SQL – Line Density

Dale posted a request to see Line Density recreated in SQL.  Please take a look at the ESRI help topic for line density.

I will warn you, the SQL for this is a total mess – I think I can simplify it, but I only spent about 15 minutes writing the query, and it’s finals week and I am tired :-)

So, I have the code below, but first let me outline the steps I need to follow – we assume a raster called rastersurface, and a vector layer called lines

Image

1.  Take a raster (here we call it rastersurface)

2.  For each pixel, we have to obtain a point (this is where we issue the NewPoint function)

3.  For each point, we have to buffer it (in this case, I am using the buffer function with 1000m)

4.  For each of the buffered points, I am performing a clipintersect with the lines.

5.  The clipintersect of the lines and the buffers are for each base pixel – so, we sum the area of each line, and group it by the unique centerX and centerY or the base pixel.

6.  We wrap that entire thing inside an UPDATE query and update the [Height (I)] feature for each pixel.  To update the correct pixel value, we have to join the table where the centerX and centerY of the pixel equals the centerX and centerY of the points from which the buffer and summation of the areas are computed. Continue reading

ARC/INFO Functions – Symmetrical Difference

I was recently asked by danb to illustrate the ARC/INFO function symmetrical difference. We basically want to find those areas in layer A that don’t intersect layer B, and also find those areas in layer B that don’t intersect layer A.  Its pretty easy to do, as it is created in two parts: subtracting the first layer from the second layer and then subtracting the second layer from the first layer and UNIONING them together.  I’ve added the new layers here so you can test it out.

SELECT * FROM
(
SELECT a.id AS aid, b.id AS bid, clipsubtract(a.id,b.id) AS g
FROM a, b

UNION ALL

SELECT a.id AS aid,b.id AS bid, clipsubtract(b.id,a.id) AS g
FROM a, b
)
RIGHT JOIN B ON bid = b.id
RIGHT JOIN A ON aid = a.id

ARC/INFO Functions – UNION

We are going to conclude our overlay posts with the ARC/INFO UNION command.  This should be a lot easier than it actually is.  I think the problem is that Manifold may have a bug in the ClipSubract command.  What I discovered was that the ClipSubract clause only seems to work on the first geometry in a layer.  With that error, it would appear that we are dead in the water.  However, if Manifold’s ClipSubract only works on the first geometry, well, we can manipulate the query to pass ClipSubract a single geometry.  To do that, we use the UnionAll command to take all the geometries in one layer and union them together into a single geometry.

So, the ARC/INFO UNION command in SQL needs three things:

1.  The intersection of the two layers (to get the intersection of the two layers)
2.  The clipsubtract of layer 1 and layer 2 (to get the part of layer 1 not intersecting layer 2)
3.  The clipsubract of layer 2 and layer 1 (to get the part of layer 2 not intersecting layer 1)

But, we are still not out of the woods yet.  We need to attach the attributes to the layers. To do that, we have to fake out Manifold a little to think that we actually have a unique ID for the the subtract layer.  The ClipSubract clause correctly clips the geometries, but it will not return the ID for the subtract layer since we UNIONEDed the entire thing into a single geometry.  So, to fake Manifold out, we return a value of 0 as the id field for the layer.

SELECT * FROM
(
  SELECT ClipIntersect(circle.id,rectangles.id) AS g, circle.id AS cid, rectangles.id AS rid
  FROM circle, rectangles
UNION ALL
 SELECT Clipsubtract(circle.[Geom (I)],(SELECT UnionAll(id) FROM rectangles)) AS g, circle.id AS cid, 0 AS rid
 FROM circle
UNION ALL
 SELECT ClipSubtract(rectangles.[Geom (I)],(SELECT unionall(id) FROM circle)) AS g, 0 AS cid, rectangles.id AS rid
 FROM rectangles
)
LEFT JOIN [circle] ON circle.id = cid
LEFT JOIN [rectangles] ON rectangles.id = rid

ARC/INFO Functions in SQL – IDENTITY

The ARC/INFO IDENTITY operation seen here is sometimes confusing because it takes ALL of the Input feature, and the part of the IDENTITY feature that intersects the input feature and merges it into a new feature class.  I recreated the features in the ESRI help documentation so that we will work with a feature class called “Circle”, and one called “Rectangles”.

identity

To show how this is done in SQL, we should simply focus on the geometry aspect first, then bring the attributes in later.

SELECT
      ClipIntersect
(circle.id,rectangles.id) g, int(circle.id) AS cid, int(rectangles.id) AS rid
FROM circle, rectangles
UNION ALL
SELECT
ClipSubtract
(rectangles.id,circle.id) AS g, int(circle.id) AS cid, int(rectangles.id) AS rid
FROM circle, rectangles

The IDENTITY operation takes two geometric operations.  First, we have to Intersect the two features to find the intersection area.  Then, we have to perform a ClipSubract to only include those areas of the input feature.  Now, the UNION ALL clause in SQL creates a new table, and sticks it under the previous table – therefore, we must have the same columns in order to do it.  The above SQL will create a geometric representation like this one:

Identityresult

So, this is what it should LOOK like, but the real power is that the attributes from the inputs are retained in the output. To do that, we take our above query, and just wrap it in a RIGHT JOIN clause:

SELECT * FROM
(
SELECT

      ClipIntersect
(circle.id,rectangles.id) g, int(circle.id) AS cid, int(rectangles.id) AS rid
FROM circle, rectangles
UNION ALL
SELECT
ClipSubtract
(rectangles.id,circle.id) AS g, int(circle.id) AS cid, int(rectangles.id) AS rid
FROM circle, rectangles
)
RIGHT JOIN [circle] ON circle.id = cid
RIGHT JOIN [rectangles] ON rectangles.id = rid
WHERE IsArea(g)

Go ahead, and give it a try with the example I have on my website.  Also, don’t forget our idea of spatial is not special.  This IDENTITY command isn’t some self contained function for which you have no control – you can always add other interesting clauses in the WHERE statement to select out certain features first, or some other interesting query.

Note:  A recent discussion on georeference.org illustrated that a lot of null values were returned.  The reason for this is because when we ask to return a ClipIntersect, if two objects don’t intersect, we’ve still asked the query engine to give us that result, and the result is in fact a null value.  So, I’ve added one line of code to only return the geometry if it is an area feature – I’ve written this in RED.  That is all you need, and it will work with more complicated features.

Spatial is Not Special – Spatial Interaction completion

We’ve spent some time looking at how to create spatial interaction models using PIVOT tables and spatial constructs in SQL.  The one last thing I wanted to add was to create an interaction model for point data. In this case, we will use the cities file we looked at previously.  I’ve highlighted in red the only change we have to make.  You’ll notice, we took our previous SQL code and changed the adjacency clause to a Distance clause.  In this example, if two cities are within 50 miles, we consider them neighbors.

TRANSFORM COUNT(*)
SELECT [Cities].name
FROM [Cities], [Cities] AS [Cities2]
WHERE DistanceEarth(Cities.[Geom (I)],Cities2.[Geom (i)],”mi”) < 50
GROUP BY [Cities].[Name]
PIVOT [Cities].[Name]

which gives us a binary, symmetrical matrix like:

cityadj

 

You can see that based on our 50 mile distance, Binghamton is a neighbor of Elmira and Ithaca, but is not considered a neighbor of Auburn, Rochester, or Syracuse.

In all of these examples, we have created a spatial weight matrix of some form – whether it be distance or adjacency.  These matrices are often used in fields like transportation modeling, economic geography, environmental modeling, or spatial regression.  I hope you have an opportunity to explore some of the really cool models that are used in these fields, and if you do, you now know how to create a spatial interaction matrix to feed that beast!

 

 

Spatial Interaction (adjacency)

We recently looked at spatial interaction in the form of inverse distance weighting.  However, some spatial interaction matrices are binary in nature (0,1), with 0 meaning the two geographic objects are not neighbors, and 1 meaning they are.  A neighbor in this case could be an adjacent are (i.e. New York and Pennsylvania are neighbors, while New York and Virginia are not) or within a specified distance (i.e. with a specified distance of 100 miles, Rochester and Syracuse are neighbors, but Syracuse and Boston are not).

In our example, we have 6 New England States:

Image

The SQL for creating the binary adjacency matrix is:

TRANSFORM COUNT(*)
SELECT [States].name
FROM [States], [States] AS [States2]
WHERE Adjacent(states.[Geom (I)],states2.[Geom (i)])
GROUP BY [States].[Name]
PIVOT [States2].[Name]

The only differences is that instead of computing the Min(Distance…), we are just obtaining the aggregate COUNT of the adjacent States by using the WHERE clause for spatial adjacency.  This makes sense because Vermont for example is adjacent to New Hampshire and Massachusetts.   But, the count of the number of times Vermot is adjacent to New Hampshire is just once – the same holds true for the other states.

The result is an adjacenty table like the following:

Image