How I used SQL to generate an accumulated sum of spatial data
My friend Tomas does work with business analytics, and wanted to find a way to perform trade area analysis. He had a bunch of stores, and a map of census areas with populations. What he wanted to figure out was:
how far do I have to radiate out from each store before I hit 5,000 customers
So for each store, he wanted to basically generate concentric buffers and sum up the population of the census areas before he hit 5,000. Once he found the census areas, he also wanted to generate a convex hull to show the area. ESRI has a really nice tool that performs this as part of their business analyst package, called threshold trade areas.
Check it out, it seems pretty cool.
Well, to help my friend I was thinking that I would determine the distance from every store to every census area, and then write a script with a giant for loop to iterate through each record, accumulating the results (of course, I would have to integrate some kind of do..while loop and an if…then to make sure I accumulated different counts for each store until I hit the threshold I wanted. At that point I began asking myself so, how good of friend is Tomas?
What I did instead was write an SQL script to do it. I’ve color coded it below to explain what I was doing….
SELECT ST_ConvexHull(ST_Collect(g)) as geometry, max(sumpop) as sumpop, name INTO tradearea FROM ( SELECT a.name, SUM(a.totpop) AS sumpop, ST_Collect(a.geometry) as g FROM (SELECT stores.name, censusunit.totpop, censusunit.geometry, ST_Distance(censusunit.geometry,stores.geometry) as dist FROM stores, censusunit ) AS a, (SELECT stores.name, censusunit.totpop, censusunit.geometry, ST_Distance(censusunit.geometry,stores.geometry) as dist FROM stores, censusunit ) AS b WHERE a.name = b.name AND a.dist <= b.dist GROUP BY a.name, b.dist ) AS T1 WHERE sumpop < 5000 GROUP BY name
The middle portion in orange collects the names of the stores, the population of the census areas, the distance between each store and each census area, and the geometry of the census areas for each combination of stores and census areas. So, if you have 5 stores and 1,000 census areas, you would have a table with 5,000 rows: Continue reading