GIS Analysis of Overlapping Layers

overlayoverlapMy friend is attempting to quantify the area of different landuse values for different areas that are upstream from her sample points.  This means she needs sample points, landuse, and upstream areas (i.e. sub-watersheds).  The problem is, her watersheds overlap, the buffer distances around the sample points overlap themselves AND the watersheds, and she then needs to summarize the results.  It’s actually a tricky problem due to the overlaps: GIS software doesn’t really like when features within a single layer overlap one another.  Also, if a buffer for a sample point overlaps two different watersheds, that becomes tricky too.

Sure you can solve it with a few for loops,  inserting the results into a new table, but that really is a hassle.  Also, I have to do it for different distances and different land cover types.

So, I once again turned to SQL – remember what I keep telling you – spatial is not special.  It’s just another data type.  This video steps you through performing a multi-ring buffer on overlapping objects from 3 different layers: sample points, watersheds, and land use.  As we step through the SQL, you’ll see how easy it is to put the query together.  And, at the end, you’ll see how flexible the query is should you want to change your objectives.  And, for good measure, we’ll throw in a little bit of parallel processing.

5(6) Ways to get data into Postgres/PostGIS

Lots of people ask me how to get data into Postgis.  This video is a quick 8 minute demonstration of 6 ways to get data into Postgres – 4 free (PGAdmin III, shp2pgsql, ogr2ogr, QGIS Database Manager) and two commercial.   (Manifold 8, Manifold 9).

Now, in this case I’m only working with a shapefile and Postgres.  I will do another post that works with other data (geodatabases, Autocad, etc.), and other databases (SQL Server, Oracle, etc.).  I hope you enjoy it.  Also, if you have other ways you like getting data into Postgis, leave a comment below….

 

 

want to learn how to use PostGIS, QGIS, Manifold, or other advanced GIS tools.  Check out my courses here.  Some of the catchy titles are Big Data Analytics with GISManage Spatial Data with Microsoft SQL ServerEnterprise GIS with Open Source SoftwareSpatial SQL With PostGIS , and Python for Geospatial.

DEM Processing – return of the beast?

I telling the guys at Manifold when something isn’t working the way I think it should.  On one hand, I get the satisfaction of sticking it to some guys who I know are smarter than me.  On the other hand, I then get to see how quickly they move to action.  Over the years it is like clockwork:

  1.  Tell Manifold their process is slower than I would like.
  2. Manifold writes back that things are pretty good.
  3. Then, send them timing comparisons with ArcGIS.
  4. Sit back and wait – about a day later the issue is fixed, and Manifold is really fast for that task.

I think a big part of the above steps in #3.  Whenever you can show the team at Manifold that their process is slower than ArcGIS, MapInfo, QGIS, Postgis, or anything else, they quickly move on it.  Also, the nature of their builds is such that they can very quickly issue updates.  It’s one of the things I really like about the product, they love making improvements.

As an example, in my last post I evaluated the contour creation in ArcGIS, Manifold, and GDAL.  You’ll recall that the results were:

GDAL: 53m
Manifold: 24m
ArcGIS: 7m

A discussion ensued on the georeference.org forum on March 9, and a number of users validated my findings on their own.  Fast forward to March 12, and a new release of the Edge builds came out to address the issue.  

Now, the same process runs in 3m in Manifold 9.  For those with a stop watch, it went from 1561s to 182s.  That is an improvement of 8.5X.  Not bad for a 3 day turn around.

If you want to learn more about Manifold and big data analytics, check out my courses at www.gisadvisor.com.  

DEM Processing – who’s the beast

(yes, I meant to say Beast).  Today I began to experiment with processing some large raster DEMs, and making contours.  I used ArcGIS, Manifold 9, and GDAL.  This was a nice initial test, and you’ll see some of the results and what I had to wrestle with below.  I think a lot of my friends could offer me advice on how to improve this test, and I would welcome that.

Data

This was a fairly good sized DEM, so it can give us a nice challenge.  The particulars about the DEM are as follows:

Garrett County, MD (2015, 1m DEM)
51187, 60832 pixels (for those of you keeping score, that’s 3.1 billion pixels!)
Size: 13.5GB
Format: ESRI Coverage

garrett

Initial Results

I ran the contouring tests on a 64-bit Dell Precision T1700, i-7 3.4GHz processor with 18GB of RAM.  I used GDAL, Manifold Viewer (Edge version), and ArcGIS 10.3.

GDAL

For GDAL, I simply used the following command to convert the DEM to a shapefile:

gdal_contour -a elev c:\temp\countywide\garrett15_1m\hdr.adf c:\temp\contour.shp -i 10.0

 

The process completed in 53m.   You’ll notice the CPU’s that were firing.  They weren’t flat, and it seemed that more than one was firing:

gdal

Manifold 9

The next test was to use Manifold.  I had Manifold 9 import the DEM data.  It took about 2 minutes to import the 13.5GB raster into Manifold.  From there, I used the contouring transform tool with “Run in parallel” checked.

mfdcpu

The process completed in 1561s (24m 1s).  Twice as fast as GDAL.

ArcGIS

Finally, I fired up ArcGIS and used the Contour (Spatial Analyst Tool).  You can see that ArcGIS does not use all the cores on the CPU:

arcgiscpu

However, I was surprised to see that the contours were completed in 7m 45 seconds – by far the fastest result.  I thought I must have done something wrong, so I ran it again, and instead of creating a geodatabase, I wrote the results out to a shapefile.  The timings were almost identical.

Display and Alignment

The next question, of course, was to see any differences in the results.  As you can see, Garrett County, even at 10m has fairly dense contours.

 

 

I decided to bring all three results into a single window.  You’ll see the red, green, and yellow (I changed the line thicknesses).  All  three products yield the same results.  So, they are obviously using a similar algorithm.

overlay.jpg

What does it all mean?

I was surprised that GDAL was so much slower.  Most people really like the speed of GDAL, but ArcGIS totally crushed it.  Also, seeing Manifold totally pinning the CPU in the parallel processing capabilities, I was surprised that ArcGIS was many times faster.  I wonder if using Manifold 9, rather than Manifold Viewer (Edge) might make a difference.   If anyone wants access to the dataset I used, feel free to email me.

USM Award

USMLogoI was honored to receive the 2018 University System of Maryland (USM) Board of Regents Faculty Award for Teaching.  This represents the highest award for teaching in the University System, and out of thousands of USM faculty, only 4 are chosen each year.   I want to thank all my colleagues in the GIS field who wrote letters on my behalf*.  It was humbling to read the letters, and gratifying to have made so many good friends over the years.

I was particularly struck by the fact that this award could have easily gone to a poet, philosopher, or physicist, but instead went to some GIS guy.  This is an opportunity to celebrate that GIScience is making significant headway into education, and the education community is showing their appreciation of it.

letter

*thank you to David DiBiase, ESRI; Owen MacDonald, University of Edinburgh; Jack Ma, University of Maryland at College Park; Julia Fisher, State of Maryland Geographic Information Officer; Michael Scott, Dean of the Henson School of Science and Technology at Salisbury University; Dr. Janet Dudley-Eshbach, President of Salisbury University

Flow Apportionment with SQL

Here’s a cool one from years back – it is a video I made in 2009.  I was doing some work on pipelines, and we were interested in determining the flow through a directed network.  I thought I’m way over my head, when I heard what my friends wanted to do.  The algorithm I wanted to try out was called the Ford-Fulkerson algorithm.

On the way back from Las Vegas (I was speaking at a Manifold user conference), I tried to replicate this in Manifold 8 – never expecting it to work, I just tinkered with a few pieces of it.  To my surprise it started to work.  Then, within an hour, I actually had the application correctly scripted.

I was going to just stop and enjoy the rest of my flight home, but I usually can’t get this stuff out of my head, so I kept going.  By the time I landed, I had the application running under different scenarios.

Anyway, here is a video I created when I got off the plane.  It shows a very crude, but interesting first step in doing flow apportionment with SQL in Manifold 8.

if you want to learn how to use spatial SQL to do things like this, check out my online video classes at gisadvisor.

Open Street Map in Manifold Viewer

I recently downloaded the entire Open Street Map for the United States.  That’s an 8GB download, but more importantly, it is a 140GB file!!  It took over a day to import the file into Manifold (someone on the forum used a SSD, and they were able to import it in 3.5 hours).  But, once the data was in Manifold, I was able to instantaneously zoom, pan, select, and identify features.  Have a look at the video below to see how fast this was:

 

this is actually pretty cool: one file, just click, and you are able to work directly with the data.

The ability for Manifold to instantaneously open the 140GB data and zoom and pan around is pretty incredible.  So, my question is: if you had this data locally on your computer, how might you use it?  Comment below…

keep your eyes out for a new course offering on Big Data Analytics with GIS.  I’m hoping to release the new course before February 1.

Big data analytics with GIS – the CSULB pilot

Well folks, it’s happening. I’m about to take one of my most adventurist steps into these training classes yet.

With the release of Manifold 9, I’m going to offer a big data analytics class that includes gigabytes of data, multi-databases, statistical processing, and parallel processing. And, it is something you will be able to participate in using only freely available software. Imagine that, a big data analytics class with free software

Delivering 20GB of data at a bring your own device (BYOD) training class is a challenge. Also, with this high level work, it is a further challenge to decide what can fit into a one day workshop.

Thankfully, the California State University at Long Beach provided me with an opportunity to teach my workshop to their students this week. It was a blast!

More importantly I learned a lot about how to put together a deep-dive of a class like this together. 8 hours is simply too short!!

The students loved the workshop, and I loved teaching it. Stay tuned, as a live workshop will be up coming to a city near you, and an abbreviated online workshop will roll out in the next month.

csuposter

 

 

 

Work smarter – not larger

When you were in Statistics 101, and the Professor said ok, we are now going to learn about the Central Limit Theorem, did you tune out? Did you sarcastically say when is someone going to grab me and order me to tell them about the Central Limit Theorem? Come on, admit it, you did.  Well, so did I – I was 18 years old, and couldn’t care less.

Well, you know what? Understanding the Central Limit Theorem has really big implications for big data analytics. Check out this 20 minute video, and you’ll see that by applying the Central Limit Theorem and some statistical theory, you can approximate the results of an expensive multi-server implementation for interrogating really large databases.

I’ll show you how you can obtain very precise estimates on really large databases by simply applying some basic statistics you should have learned Freshman year (but you were too busy partying, weren’t you?)

 

stay tuned, I’ll be coming out with a big data analytics class in the New Year.  If you want to learn more about SQL, programming, open source GIS, or Manifold, check out courses at www.gisadvisor.com.  

Salisbury Gives Back – 4 new GIS tools

My students had a great time presenting the tools they created in my GIS Programming class.  I want to thank our friends from Esri for joining us via GoTo Meeting.   Their feedback and advice was greatly appreciated.

Not only does this add new tools to fill a specific niche with ArcGIS, it also shows that Arcpy is simple enough quickly create these tools.  Further, it shows just how good the Salisbury University Undergraduate Geography students are.  Click the links below, and you’ll see the presentations (we are currently putting the code together so people can download it).  Stay tuned to see that link.

Join Count Analysis: This tool focuses on the spatial autocorrelation method of join count analysis that evaluates area features with binary variables to determine if the data is random, dispersed or clustered.  The tool calculates the expected and observed dissimilar joins, the Z score, and associated p-value.  The presenters are Cody Garcia, Kyle Lane, and Alex Nowak.

 

Quadrat Analysis: This tool focuses on the spatial autocorrelation of point patterns across a landscape. Using points and a grid of quadrats, this tool measures whether a point pattern is random, dispersed, or clustered.  The tool calculates the variance to mean ratio, Chi-square value, and associated p-value.  In addition, the tool thematically shades the quadrats based on their counts.  The presenters are Jeremy Gencavage, Grant Chalfin, and Zach Radziewicz.

 

Stratified Sampling Tool – This is not a spatial tool, but rather a tool that takes stratified samples and generates point and interval estimates from the stratums.  Generating estimates from stratified sampling is a very powerful statistical technique, and provides significant improvements over simple random sampling.  The presenters are Bryan VanGiesen and Brian Hiller.

 

ArcGIS and Google Route Optimization Tool –  This tool integrates ArcGIS with the Google Maps API so that users can route the ArcGIS features over a Google network, using the Google Routing Engine.  Users can input their own geodatabase feature classes, and the tool returns the Google routes as a geodatabase feature class, along with driving directions.  The presenters are Meghan Murphy, Thomas Simpson, and Liam Doherty.