PostGIS and Multiprocessing

taxis

NYC taxi cab pickup locations for October, 2012

OK, let’s cut to the chase: I used Python’s parallel processing capabilities with Postgres to perform a spatial overlay on approximately 25 million taxi pickup locations (over 5 GB of data), and processed all of it in under 3 minutes!!  There.  Now you can decide if it’s worth your time to read this long post. Continue reading

Denver workshops – a mixture of sadness and joy.

denver1We just completed another two successful Keeping Up with GIS Technology workshops out in Denver.  This was a week mixed with sadness and joy: sadness in that my Mom passed away on Tuesday, and I had to fly out of town to Denver on Wednesday (the show must go on).  But, joy, as I was able to catch up with many former students, classmates, and friends in the Denver area.  Also, relief as I got to spend the morning with my mom, and then she passed away quickly, quietly, and most importantly, painlessly – reunited with my Dad. Continue reading

Follow up to my big data test – improving PostGIS performance

Just a quick follow-up to my big data test.  If you remember, I was able to determine the number of taxi pickups and the sum of the fares for each zone using Postgres and PostGIS in 1m 40s.  Some of the taxi zones are a little large, so the containment query might actually take a little longer when comparing the bounding boxes in the spatial index.  To get around that, I used ST_SubDivide to break the larger taxi zones into smaller polygons:

tsub

this meant that my taxi zone polygons went from 263 to 4,666.  Now, on the face level, what idiot would do an overlay with 4,666 polygons when 263 is smaller – this idiot!  To understand this, you should read my blog post on When More is Less, you’ll see there is good logic behind the madness.  Well, anyway, that’s what I did, and we went from 1m 40s down to 1m 3s. Continue reading

Big Data Results

I wanted to revisit the taxi data example that I previously blogged about.  I had a 6GB file of 16 million taxi pickup locations and 260 taxi zones.  I wanted to determine the number of pickups in each zone, along with the sum of all the fares.  Below is a more in-depth review of what was done, but for those of you not wanting to read ahead, here are the result highlights:

Platform Command Time
ArcGIS 10.4 AddJoinManagement Out of memory
ArcGIS Pro Summarize Within 1h 27m*
ArcGIS Server Big Data GeoAnalytics with Big Data File Share Summarize Within

Aggregate Points

~2m
Manifold 9 GeomOverlayContained 3m 27s
Postgres/PostGIS ST_Contains 10m 30s
Postgres/PostGIS (optimized) ST_Contains 1m 40s
*I’m happy ArcGIS Pro ran at this speed, but I think it can do better.  This is a geodatabase straight out of the box. I think we can fiddle with indexes and even structuring the data to get things to run faster.  That is something I’ll work on next week.

Continue reading

Maryland GIS Conference: Workshop Results

We had a very successful workshop on GIS at the Maryland Geospatial Conference – well, actually two workshops.  I was asked to teach a 4-hour workshop on  GIS technology:

blurbThe workshop covered 4 different topics in 4 hours: Desktop GIS with QGIS, Server based GIS with Postgres/PostGIS, Developer GIS with Python, and finally Big Data Analytics with GIS.  That’s a lot of material in a short amount of time.  I wondered what the interest would be…

Continue reading

Is a Geography Degree worth it?

Recently, we were asked by our new University President, Chuck Wight, how our students are doing in obtaining their first job.  You have to understand, Chuck is fanatical about the student experience, and frequently gives talks on making college more affordable, and demonstrating that a college degree has value.  This was an excellent question.  What he was really doing was asking what is on the minds of so many parents:

is this degree that my child is getting worth the money we are going to pay

I’m embarrassed to say that most of us who heard this question only really had anecdotal evidence.  However, some of us came away from our meeting inspired by Chuck, instead of dejected.  Rather than wait to do something, we reached out to around 60 of our recent graduates (May 2018, December 2017) with a Google poll to see how they were doing immediately after graduation.

Continue reading

Finding “Dangles” with PostGIS

Do you have a set of lines that you need to determine if there are any “dangle” nodes?  A dangle is a line segment that overhangs another line segment.  Now, some dangles are valid, like a pipe that terminates in a cul-de-sac.

A few people have posted about this already, but I figured I would give it a shot as well, as I think my SQL is a little more terse.  Anyway, here is the query, and we’ll talk about it line by line:

SELECT DISTINCT g1 ASINTO dangles
FROM plines, 
    (SELECT g AS g1 FROM  
         (SELECT g, count(*) AS cnt  
          FROM  
              (SELECT  ST_StartPoint(g) AS g FROM plines
               UNION ALL
               SELECT  ST_EndPoint(g) AS g FROM plines ) AS T1 
         GROUP BY g) AS T2
     WHERE cnt = 1) AS T3
WHERE ST_Distance(g1, g) BETWEEN 0.01 AND 2;

Continue reading

Multi-ring (non-overlapping) Buffers with Manifold 9

In my last post, I showed you how to create multi-ring, non-overlapping buffers with spatial SQL in PostGIS.  In this post, I want to do the same thing, but with Manifold GIS.  To be honest, it is pretty much the same thing, although, I think Manifold is a little easier because they utilize a FIRST aggregate function in SQL, where in PostGIS, we had to use a DISTINCT ON.  Either way, it is pretty easy, so now database and SQL professionals have a way to create multi-ring buffers entirely in SQL.

 If you want to learn more about SQL, programming, open source GIS, or Manifold GIS, check out courses at www.gisadvisor.com.  

Multi-Ring (non-overlapping) Buffers with PostGIS

I was interested in creating mult-ring buffers but with a twist: I didn’t want the buffers to overlap one another.  In other words, if I had concentric buffers with distances of 100, 200, and 300 around a point, I want those buffers to reflect distances of 0-100, 100-200, and 200-300.  I don’t want them overlapping one another.  You can actually do that with the PostGIS function ST_SymDifference, but there are a few nuances that you have to be aware of.

Unlike some of my longer videos, this one will start out with the answer, and then we’ll walk through all the SQL.  You’ll see it isn’t so bad.  And, you continue to see that spatial is not special!.  It’s only 20 minutes long, but the answer is shown in the first minute.

In the video I’ll slowly walk you through all the spatial SQL to create buffers for the points and trim all the overlaps so that there are no overlapping buffers.  You’ll learn some really cool Postgres commands  including:

 ST_BufferST_DifferenceSymDISTINCT ON, and SET WITH OIDS.

I found myself amazed that with a few SQL tweaks, we were able to turn ordinary buffers to more useful non-overlapping buffers.  I hope you enjoy the video.

I’d like to create more videos like that – please leave so comments below so that I know others want me to continue these kinds of tutorials.

 If you want to learn more about SQL, programming, open source GIS, or Manifold GIS, check out courses at www.gisadvisor.com.