About Art Lembo

I am a Professor at Salisbury University where I teach courses in quantitative geography and GIS.

Bivariate Choropleth Maps with Arcpy

In my previous post, I showed how to prepare the data for a bivariate choropleth map using PostGIS and QGIS. I also indicated that there is a website that shows an ArcGIS tool to do it. But, this actually turns into a good opportunity to illustrate some Python, and how to create the bivariate data using Arcpy.

Arcpy is certainly not as terse as SQL, but it does get the job done, and rather easily. We just have to think about the project a little differently. The code below is a Script tool that I created.

import arcpy, math, numpy
fc = arcpy.GetParameterAsText(0)

numrecs = int(arcpy.GetCount_management(fc).getOutput(0))


fields = arcpy.ListFields(fc, "bimode")
if len(fields) != 1:
    arcpy.AddField_management(fc, "bimode", "text", 3)

f1 = arcpy.GetParameterAsText(1)
f2 = arcpy.GetParameterAsText(2)
fields = ['bimode',f1,f2]

var1 = arcpy.UpdateCursor(fc, sort_fields=f1)

i=1
for row in var1:
    row.setValue("bimode",str(int(math.ceil((float(i) / float(numrecs)) * 3.0))))
    var1.updateRow(row)
    i=i+1

var2 = arcpy.UpdateCursor(fc, sort_fields=f2)

i=1
for row in var2:
    row.setValue("bimode",row.getValue("bimode") + "." + str(int(math.ceil((float(i) / float(numrecs)) * 3.0))))
    var2.updateRow(row)
    i=i+1
Continue reading

Easy bivariate map with Postgres

A friend recently asked me about the cool looking bivariate maps produced with ArcOnline, lamenting that the capability seemed lacking in ArcGIS. Well, it turns out that ESRI has a .dll you can use, and there is a good article here. So, if you want to create these great looking maps in ArcGIS, it shouldn’t be a problem. The website will allow you to download the .dll, and you can also watch the video on how to use it. Well worth the time.

So of course that got me thinking: could we make the same map with spatial SQL. Well, sure, and it is super easy. If you want to get spun up on what these 9 color bivariate choropleth maps are, and the theory behind it, have a look at this great site. Josh does a great job explaining how this works, but it is a little cumbersome if you want to pump out map after map. But, with a very little bit of SQL, you can easily pull it off.

Let’s start with our data: I have a Postgres table of United States County boundaries, with the attributes percentobese and percentdiabetes, along with a FIPS code and a geometry column.

To prepare the data for the map, we simply issue this SQL query:

SELECT 
fips,geom,
ntile(3) over (order by percentobese) || '.' ||
ntile(3) over (order by percentdiabetes) AS bimode
INTO qlayer
FROM ushealthrisk

yes, that’s it. Really.

I’m not kidding. We’re done, folks. Go home, nothing left to see.

Well, if you insist on reading, I’ll tell you what the SQL does, and how to actually visualize the data.

Continue reading

Ready for a new career?

I just got off the phone with a very large developer. I want to call him a GIS developer, but really that’s not was he does. Rather, he is a spatial solutions developer. According to him, when we talk about GIS, we are mostly talking about a particular software product, or products. He feels like the term GIS pigeon holes him. In his world, he solves really large spatial problems, and thus, doesn’t think in terms of GIS, but rather spatial solutions. Who am I to argue with him, they have hundreds of spatial scientists working for them.

It turns out his company moved their entire spatial analysis tasks to FOSS4g. He was calling me because the guy that oversees a lot of their FOSS4g work said he was inspired by the courses I have on gisadvisor.com. That was humbling, for sure. But what really struck me was in the middle of our conversation when talking about the courses I teach at the University, he said:

Art, if you had 9 or 10 students graduating with those FOSS4g skills, I could set them up with a job in under two weeks.

In fact, he said they, and others like them, need hundreds of new employees. It was astounding to me to hear about all the really cool spatial work going on out there, without the term GIS being used, and how much of it was using FOSS4g, and how much of it was being used in really large organizations that would never even talk about it publicly.

It made me think about the courses and workshops I offer, and wanting to help create a path for professionals to move from traditional GIS to that of spatial data analysts with FOSS4g experience. I know in the past that people have taken my courses (i.e. Python for FOSS4g, PostGIS, Geoserver, etc.) and gotten spun up to get an interview doing spatial database work, and even obtain a job in the field.

So, the picture below shows the FOSS4g courses I offer, and if you are willing to disappear for a few weekends or evenings, you can get spun up on all these technologies and be ready to move into the exciting FOSS4g world.

Six FOSS4g courses offered by gisadvisor.com to help you gain the skills to move into a high performing spatial analysis field

I’m hoping that he might be willing to share a little more publicly, as I’d like to see students cycle through these 6 courses (and my upcoming Big Data Analytics with FOSS4g) and then hopefully walk into some new careers. I’m actually testing this approach with a couple of people now, so we’ll see how much they can learn in a month or two. All in all, I’m almost up to 10,000 students overall, and the ratings for the courses are 4.3 out of 5!

Engaging students with multiuser editing, and FOSS4g

A friend is doing some research on estrogen and estrodial in water samples in Delaware. We have about 50 sub-watersheds that show the upstream contributing area for each water sample. We’ve done some regression analysis to see if we can find any correlation of estrogen with landscape parameters, like land use.


Geography students collaborating on data collection for a research project. Notice there are no windows in this lab – this is the best way to get students to want to go to graduate school so they don’t have to work like this again!


So far, we haven’t been impressed with the results. Nonetheless, I’m not ready to give up, so I thought:

what if there is a relationship with the number of poultry houses upstream of each sample.

The bad news: we don’t have a layer of poultry houses in Delaware.
The good news: I have a bunch of eager students who want to learn GIS.
The solution: Let’s use a lecture to teach students about multi-user, simultaneous editing with Postgres and PostGIS, and have them do it.

So that’s what we did. On Wednesday, I took 18 of my Advanced GIS students, and introduced them to an 8 minute video that would step them through digitizing all the poultry houses in the watershed. We had Postgres running on the teaching computer, and QGIS on 18 workstations in the classrooms. Our basic setup looked like this (with 18 QGIS workstations, of course):

I then had to instruct my students how to do the digitizing. Since people catch on at different paces, the best thing was to just use this video:

After 1h 40m, the students had around 1,000 poultry houses digitized! And, as a learning experience, they had the opportunity to see how an enterprise class database could be stood up in a matter of minutes to facilitate mult-user digitizing. It blew their mind to realize that after 2 hours, we had actually put in 36 man hours of digitizing. They had never used QGIS before, and it only took an 8 minute video to spin them up on the project!

This was easy to do, excited the students to be exposed to multi-user editing and open source GIS, and accomplished an important task for a research project that an individual student is working on. Everyone wins!

I hope this inspires you to come up with some creative ways to introduce GIS concepts to your class.

If you want to learn more about how to build an enterprise GIS with open source tools, check out my courses on www.gisadvisor.com

PostGIS and Multiprocessing

taxis

NYC taxi cab pickup locations for October, 2012

OK, let’s cut to the chase: I used Python’s parallel processing capabilities with Postgres to perform a spatial overlay on approximately 25 million taxi pickup locations (over 5 GB of data), and processed all of it in under 3 minutes!!  There.  Now you can decide if it’s worth your time to read this long post. Continue reading

Denver workshops – a mixture of sadness and joy.

denver1We just completed another two successful Keeping Up with GIS Technology workshops out in Denver.  This was a week mixed with sadness and joy: sadness in that my Mom passed away on Tuesday, and I had to fly out of town to Denver on Wednesday (the show must go on).  But, joy, as I was able to catch up with many former students, classmates, and friends in the Denver area.  Also, relief as I got to spend the morning with my mom, and then she passed away quickly, quietly, and most importantly, painlessly – reunited with my Dad. Continue reading

Follow up to my big data test – improving PostGIS performance

Just a quick follow-up to my big data test.  If you remember, I was able to determine the number of taxi pickups and the sum of the fares for each zone using Postgres and PostGIS in 1m 40s.  Some of the taxi zones are a little large, so the containment query might actually take a little longer when comparing the bounding boxes in the spatial index.  To get around that, I used ST_SubDivide to break the larger taxi zones into smaller polygons:

tsub

this meant that my taxi zone polygons went from 263 to 4,666.  Now, on the face level, what idiot would do an overlay with 4,666 polygons when 263 is smaller – this idiot!  To understand this, you should read my blog post on When More is Less, you’ll see there is good logic behind the madness.  Well, anyway, that’s what I did, and we went from 1m 40s down to 1m 3s. Continue reading

Big Data Results

I wanted to revisit the taxi data example that I previously blogged about.  I had a 6GB file of 16 million taxi pickup locations and 260 taxi zones.  I wanted to determine the number of pickups in each zone, along with the sum of all the fares.  Below is a more in-depth review of what was done, but for those of you not wanting to read ahead, here are the result highlights:

Platform Command Time
ArcGIS 10.4 AddJoinManagement Out of memory
ArcGIS Pro Summarize Within 1h 27m*
ArcGIS Server Big Data GeoAnalytics with Big Data File Share Summarize Within

Aggregate Points

~2m
Manifold 9 GeomOverlayContained 3m 27s
Postgres/PostGIS ST_Contains 10m 30s
Postgres/PostGIS (optimized) ST_Contains 1m 40s
*I’m happy ArcGIS Pro ran at this speed, but I think it can do better.  This is a geodatabase straight out of the box. I think we can fiddle with indexes and even structuring the data to get things to run faster.  That is something I’ll work on next week.

Continue reading

Maryland GIS Conference: Workshop Results

We had a very successful workshop on GIS at the Maryland Geospatial Conference – well, actually two workshops.  I was asked to teach a 4-hour workshop on  GIS technology:

blurbThe workshop covered 4 different topics in 4 hours: Desktop GIS with QGIS, Server based GIS with Postgres/PostGIS, Developer GIS with Python, and finally Big Data Analytics with GIS.  That’s a lot of material in a short amount of time.  I wondered what the interest would be…

Continue reading