Spatial is Not Special – Descriptive Statistics

OK, I admit it, this post really isn’t about “spatial”.  It’s more about descriptive statistics, but I wanted to introduce the ability to use SQL to produce descriptive statistics like the mean, standard deviation, skew, kurtosis, and coefficient of variation (CV).  As we say in our book Statistical Problem Solving in Geography, CV is an underutilized tool that has great utility for geographers.

 For the 2010 and 1980 State, Census Division, and Census Region, we show how the basic descriptive statistics change for different levels of spatial aggregation.  But, what we don’t show is how easy it is to calculate these statistics.  Using the Census vector file, with fields for Total Population (Pop2000 and Pop2010), we can easily compute these descriptive statistics as shown in Table 3.11 from the book.  Just note, that the dataset we are using in today’s example has the population from 2000 and 2010.  Also, to keep things simple we aren’t going to compute the SKEW or KURTOSIS (WordPress gets messy, and I’m more concerned that you understand what we are actually doing).

table3_11

Lets first deal with the State level data for the year 2000.

SELECT “STATES 2000” AS [Year], AVG([POP2000]) AS Mean,
                     STDEV([POP2000]) AS [Standard Deviation],
                     STDEV([POP2000])/AVG([POP2000])*100 AS [Coefficient of Variation]
FROM [USStates]

 You can see that we are using the AVG function to compute the average, the STDEV function to compute the Standard Deviation, and we are dividing those together to compute the Coefficient of Variation.  Easy enough.  But, our table also includes the Census Division and Region.  To compute these, we will issue the UNION ALL command in SQL which will take the results of the next query, and insert it at the end of the previous query.  So, for this query we have to remember to have the same field names returned.  Let’s look at the Census Division:

 SELECT “DIVISION 2000” AS [Year], AVG(PopByState) AS Mean,
       STDEV(PopByState) AS [Standard Deviation],
       STDEV(PopByState)/AVG(PopByState)*100 AS [Coefficient of Variation]
FROM   
(SELECT SUM([POP2000]) AS PopByState
        FROM [USStates]
       GROUP BY [DIVISION]
)

 Notice that the inner query is an aggregate clause, and we are going to GROUP the State populations by the DIVISION they are in (each State has a DIVISION code).  That will return a table with all the populations aggregated by the 9 Census Divisions.  From that query, we then compute the average of the 9 aggregated populations along with the standard deviation and the coefficient of variation.

 Finally, we can create a third query for the Census Regions as follows:

SELECT “REGION 2000” AS [Year], AVG(PopByState) AS Mean,
          STDEV(PopByState) AS [Standard Deviation],
          STDEV(PopByState)/AVG(PopByState)*100 AS [Coefficient of Variation]
FROM
(SELECT SUM([POP2000]) AS PopByState
    FROM [USStates]
    GROUP BY [REGION]
)

This is the same as the previous query (just a simple copy and paste) and changing [DIVISION] to [REGION].  That will provide the descriptive statistics for the 4 Census Regions.

Finally,  once we have figured out how to write the component parts, we will string them all together to form a single query with the UNION ALL command:

SELECT “STATES 2000” AS [Year], avg([POP2000]) AS Mean, stdev([POP2000]) AS [Standard Deviation], stdev([POP2000])/avg([POP2000])*100 AS [Coefficient of Variation]
FROM [USStates]

UNION ALL

SELECT “DIVISION 2000” AS [Year], avg(PopByState) AS Mean, stdev(PopByState) AS [Standard Deviation], stdev(PopByState)/avg(PopByState)*100 AS [Coefficient of Variation]
FROM
(SELECT sum([POP2000]) AS PopByState
FROM [USStates]
GROUP BY [DIVISION]
)

UNION ALL

SELECT “REGION 2000” AS [Year], avg(PopByState) AS Mean,
stdev(PopByState) AS [Standard Deviation],
stdev(PopByState)/avg(PopByState)*100 AS [Coefficient of Variation]
FROM
(SELECT sum([POP2000]) AS PopByState
FROM [USStates]
GROUP BY [REGION]
)
UNION ALL

SELECT “STATES 2010” AS [Year], avg([POP2010]) AS Mean,
stdev([POP2010]) AS [Standard Deviation],
stdev([POP2010])/avg([POP2010])*100 AS [Coefficient of Variation]
FROM [USStates]

UNION ALL

SELECT “DIVISION 2010” AS [Year], avg(PopByState) AS Mean,
stdev(PopByState) AS [Standard Deviation],
stdev(PopByState)/avg(PopByState)*100 AS [Coefficient of Variation]
FROM
(SELECT sum([POP2010]) AS PopByState
FROM [USStates]
GROUP BY [DIVISION]
)

UNION ALL

SELECT “REGION 2010” AS [Year], avg(PopByState) AS Mean,
stdev(PopByState) AS [Standard Deviation],
stdev(PopByState)/avg(PopByState)*100 AS [Coefficient of Variation]
FROM
(SELECT sum([POP2010]) AS PopByState
FROM [USStates]
GROUP BY [REGION]
)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s