About Art Lembo

I am a Professor at Salisbury University where I teach courses in quantitative geography and GIS.

Finding “Dangles” with PostGIS

Do you have a set of lines that you need to determine if there are any “dangle” nodes?  A dangle is a line segment that overhangs another line segment.  Now, some dangles are valid, like a pipe that terminates in a cul-de-sac.

A few people have posted about this already, but I figured I would give it a shot as well, as I think my SQL is a little more terse.  Anyway, here is the query, and we’ll talk about it line by line:

SELECT DISTINCT g1 ASINTO dangles
FROM plines, 
    (SELECT g AS g1 FROM  
         (SELECT g, count(*) AS cnt  
          FROM  
              (SELECT  ST_StartPoint(g) AS g FROM plines
               UNION ALL
               SELECT  ST_EndPoint(g) AS g FROM plines ) AS T1 
         GROUP BY g) AS T2
     WHERE cnt = 1) AS T3
WHERE ST_Distance(g1, g) BETWEEN 0.01 AND 2;

The first thing to notice is the most inner select statement.  We are using ST_StartPoint and ST_EndPoint to grab the endpoints of the lines – these we’ll call nodes.

The next line to notice is where we are getting the count of the nodes.  We are grabbing all the nodes, but using the GROUP BY function to return the number of nodes that occupy a place in space.  Now, an intersection of two lines will have 2 nodes (from the first line and the second line).  But, a “dangle” will only have one node occupying a space.  This is where the next section of SQL comes into play.

What we want to do is only select those nodes where the count (cnt) is equal to 1.  That means the node is just sitting there in space.  It is a “dangle”.  But, not all dangles are created equally, as I said above.  That final WHERE clause lets me specify how far I want a dangle node to be from another node.  In the example above, we are choosing under 2m apart.  The last bit of SQL we have to consider is the DISTINCT clause.  Nodes can be near one or more lines.  We don’t want to double count them, so using DISTINCT eliminates the duplicates.

That’s it!  Pretty easy.  Think of the ST_Distance function as a variant of the basic SQL to find dangles.  There are other variants we could add to this if we’d like, such as the length of the line the dangle touches has to be less than 5m, or something like that.  That would be just a matter of adding another WHERE clause.

 

 

Multi-ring (non-overlapping) Buffers with Manifold 9

In my last post, I showed you how to create multi-ring, non-overlapping buffers with spatial SQL in PostGIS.  In this post, I want to do the same thing, but with Manifold GIS.  To be honest, it is pretty much the same thing, although, I think Manifold is a little easier because they utilize a FIRST aggregate function in SQL, where in PostGIS, we had to use a DISTINCT ON.  Either way, it is pretty easy, so now database and SQL professionals have a way to create multi-ring buffers entirely in SQL.

 If you want to learn more about SQL, programming, open source GIS, or Manifold GIS, check out courses at www.gisadvisor.com.  

Multi-Ring (non-overlapping) Buffers with PostGIS

I was interested in creating mult-ring buffers but with a twist: I didn’t want the buffers to overlap one another.  In other words, if I had concentric buffers with distances of 100, 200, and 300 around a point, I want those buffers to reflect distances of 0-100, 100-200, and 200-300.  I don’t want them overlapping one another.  You can actually do that with the PostGIS function ST_SymDifference, but there are a few nuances that you have to be aware of.

Unlike some of my longer videos, this one will start out with the answer, and then we’ll walk through all the SQL.  You’ll see it isn’t so bad.  And, you continue to see that spatial is not special!.  It’s only 20 minutes long, but the answer is shown in the first minute.

In the video I’ll slowly walk you through all the spatial SQL to create buffers for the points and trim all the overlaps so that there are no overlapping buffers.  You’ll learn some really cool Postgres commands  including:

 ST_BufferST_DifferenceSymDISTINCT ON, and SET WITH OIDS.

I found myself amazed that with a few SQL tweaks, we were able to turn ordinary buffers to more useful non-overlapping buffers.  I hope you enjoy the video.

I’d like to create more videos like that – please leave so comments below so that I know others want me to continue these kinds of tutorials.

 If you want to learn more about SQL, programming, open source GIS, or Manifold GIS, check out courses at www.gisadvisor.com.  

GIS Analysis of Overlapping Layers

overlayoverlapMy friend is attempting to quantify the area of different landuse values for different areas that are upstream from her sample points.  This means she needs sample points, landuse, and upstream areas (i.e. sub-watersheds).  The problem is, her watersheds overlap, the buffer distances around the sample points overlap themselves AND the watersheds, and she then needs to summarize the results.  It’s actually a tricky problem due to the overlaps: GIS software doesn’t really like when features within a single layer overlap one another.  Also, if a buffer for a sample point overlaps two different watersheds, that becomes tricky too.

Sure you can solve it with a few for loops,  inserting the results into a new table, but that really is a hassle.  Also, I have to do it for different distances and different land cover types.

So, I once again turned to SQL – remember what I keep telling you – spatial is not special.  It’s just another data type.  This video steps you through performing a multi-ring buffer on overlapping objects from 3 different layers: sample points, watersheds, and land use.  As we step through the SQL, you’ll see how easy it is to put the query together.  And, at the end, you’ll see how flexible the query is should you want to change your objectives.  And, for good measure, we’ll throw in a little bit of parallel processing.

5(6) Ways to get data into Postgres/PostGIS

Lots of people ask me how to get data into Postgis.  This video is a quick 8 minute demonstration of 6 ways to get data into Postgres – 4 free (PGAdmin III, shp2pgsql, ogr2ogr, QGIS Database Manager) and two commercial.   (Manifold 8, Manifold 9).

Now, in this case I’m only working with a shapefile and Postgres.  I will do another post that works with other data (geodatabases, Autocad, etc.), and other databases (SQL Server, Oracle, etc.).  I hope you enjoy it.  Also, if you have other ways you like getting data into Postgis, leave a comment below….

 

 

want to learn how to use PostGIS, QGIS, Manifold, or other advanced GIS tools.  Check out my courses here.  Some of the catchy titles are Big Data Analytics with GISManage Spatial Data with Microsoft SQL ServerEnterprise GIS with Open Source SoftwareSpatial SQL With PostGIS , and Python for Geospatial.

DEM Processing – return of the beast?

I telling the guys at Manifold when something isn’t working the way I think it should.  On one hand, I get the satisfaction of sticking it to some guys who I know are smarter than me.  On the other hand, I then get to see how quickly they move to action.  Over the years it is like clockwork:

  1.  Tell Manifold their process is slower than I would like.
  2. Manifold writes back that things are pretty good.
  3. Then, send them timing comparisons with ArcGIS.
  4. Sit back and wait – about a day later the issue is fixed, and Manifold is really fast for that task.

I think a big part of the above steps in #3.  Whenever you can show the team at Manifold that their process is slower than ArcGIS, MapInfo, QGIS, Postgis, or anything else, they quickly move on it.  Also, the nature of their builds is such that they can very quickly issue updates.  It’s one of the things I really like about the product, they love making improvements.

As an example, in my last post I evaluated the contour creation in ArcGIS, Manifold, and GDAL.  You’ll recall that the results were:

GDAL: 53m
Manifold: 24m
ArcGIS: 7m

A discussion ensued on the georeference.org forum on March 9, and a number of users validated my findings on their own.  Fast forward to March 12, and a new release of the Edge builds came out to address the issue.  

Now, the same process runs in 3m in Manifold 9.  For those with a stop watch, it went from 1561s to 182s.  That is an improvement of 8.5X.  Not bad for a 3 day turn around.

If you want to learn more about Manifold and big data analytics, check out my courses at www.gisadvisor.com.  

DEM Processing – who’s the beast

(yes, I meant to say Beast).  Today I began to experiment with processing some large raster DEMs, and making contours.  I used ArcGIS, Manifold 9, and GDAL.  This was a nice initial test, and you’ll see some of the results and what I had to wrestle with below.  I think a lot of my friends could offer me advice on how to improve this test, and I would welcome that.

Data

This was a fairly good sized DEM, so it can give us a nice challenge.  The particulars about the DEM are as follows:

Garrett County, MD (2015, 1m DEM)
51187, 60832 pixels (for those of you keeping score, that’s 3.1 billion pixels!)
Size: 13.5GB
Format: ESRI Coverage

garrett

Initial Results

I ran the contouring tests on a 64-bit Dell Precision T1700, i-7 3.4GHz processor with 18GB of RAM.  I used GDAL, Manifold Viewer (Edge version), and ArcGIS 10.3.

GDAL

For GDAL, I simply used the following command to convert the DEM to a shapefile:

gdal_contour -a elev c:\temp\countywide\garrett15_1m\hdr.adf c:\temp\contour.shp -i 10.0

 

The process completed in 53m.   You’ll notice the CPU’s that were firing.  They weren’t flat, and it seemed that more than one was firing:

gdal

Manifold 9

The next test was to use Manifold.  I had Manifold 9 import the DEM data.  It took about 2 minutes to import the 13.5GB raster into Manifold.  From there, I used the contouring transform tool with “Run in parallel” checked.

mfdcpu

The process completed in 1561s (24m 1s).  Twice as fast as GDAL.

ArcGIS

Finally, I fired up ArcGIS and used the Contour (Spatial Analyst Tool).  You can see that ArcGIS does not use all the cores on the CPU:

arcgiscpu

However, I was surprised to see that the contours were completed in 7m 45 seconds – by far the fastest result.  I thought I must have done something wrong, so I ran it again, and instead of creating a geodatabase, I wrote the results out to a shapefile.  The timings were almost identical.

Display and Alignment

The next question, of course, was to see any differences in the results.  As you can see, Garrett County, even at 10m has fairly dense contours.

 

 

I decided to bring all three results into a single window.  You’ll see the red, green, and yellow (I changed the line thicknesses).  All  three products yield the same results.  So, they are obviously using a similar algorithm.

overlay.jpg

What does it all mean?

I was surprised that GDAL was so much slower.  Most people really like the speed of GDAL, but ArcGIS totally crushed it.  Also, seeing Manifold totally pinning the CPU in the parallel processing capabilities, I was surprised that ArcGIS was many times faster.  I wonder if using Manifold 9, rather than Manifold Viewer (Edge) might make a difference.   If anyone wants access to the dataset I used, feel free to email me.

USM Award

USMLogoI was honored to receive the 2018 University System of Maryland (USM) Board of Regents Faculty Award for Teaching.  This represents the highest award for teaching in the University System, and out of thousands of USM faculty, only 4 are chosen each year.   I want to thank all my colleagues in the GIS field who wrote letters on my behalf*.  It was humbling to read the letters, and gratifying to have made so many good friends over the years.

I was particularly struck by the fact that this award could have easily gone to a poet, philosopher, or physicist, but instead went to some GIS guy.  This is an opportunity to celebrate that GIScience is making significant headway into education, and the education community is showing their appreciation of it.

letter

*thank you to David DiBiase, ESRI; Owen MacDonald, University of Edinburgh; Jack Ma, University of Maryland at College Park; Julia Fisher, State of Maryland Geographic Information Officer; Michael Scott, Dean of the Henson School of Science and Technology at Salisbury University; Dr. Janet Dudley-Eshbach, President of Salisbury University

Flow Apportionment with SQL

Here’s a cool one from years back – it is a video I made in 2009.  I was doing some work on pipelines, and we were interested in determining the flow through a directed network.  I thought I’m way over my head, when I heard what my friends wanted to do.  The algorithm I wanted to try out was called the Ford-Fulkerson algorithm.

On the way back from Las Vegas (I was speaking at a Manifold user conference), I tried to replicate this in Manifold 8 – never expecting it to work, I just tinkered with a few pieces of it.  To my surprise it started to work.  Then, within an hour, I actually had the application correctly scripted.

I was going to just stop and enjoy the rest of my flight home, but I usually can’t get this stuff out of my head, so I kept going.  By the time I landed, I had the application running under different scenarios.

Anyway, here is a video I created when I got off the plane.  It shows a very crude, but interesting first step in doing flow apportionment with SQL in Manifold 8.

if you want to learn how to use spatial SQL to do things like this, check out my online video classes at gisadvisor.

Open Street Map in Manifold Viewer

I recently downloaded the entire Open Street Map for the United States.  That’s an 8GB download, but more importantly, it is a 140GB file!!  It took over a day to import the file into Manifold (someone on the forum used a SSD, and they were able to import it in 3.5 hours).  But, once the data was in Manifold, I was able to instantaneously zoom, pan, select, and identify features.  Have a look at the video below to see how fast this was:

 

this is actually pretty cool: one file, just click, and you are able to work directly with the data.

The ability for Manifold to instantaneously open the 140GB data and zoom and pan around is pretty incredible.  So, my question is: if you had this data locally on your computer, how might you use it?  Comment below…

keep your eyes out for a new course offering on Big Data Analytics with GIS.  I’m hoping to release the new course before February 1.