If you are like me, you love writing spatial SQL. But what can you do if you have a geodatabase, some shapefiles, or any other data format? Well, you can always take those data formats and import them into Postgres or SQLite, and then issue the SQL for the results. Sure, it works, but it’s an extra step, and you no longer have access to the desktop GIS tools.
Well, with QGIS, you can drag and drop any data format into your project. And, with the DB Manager, you can refer to those layers as Virtual Layers, allowing you to issue SQL on them. Take a look at this video to see how issuing SQL on disparate data sources is a breeze.
I still think Manifold has the best integrated SQL engine tied to their GIS, but virtual layers in QGIS is seriously compelling, especially with the drag-and-drop capabilities of bringing in the different data sets.
A friend recently asked me about the cool looking bivariate maps produced with ArcOnline, lamenting that the capability seemed lacking in ArcGIS. Well, it turns out that ESRI has a .dll you can use, and there is a good article here. So, if you want to create these great looking maps in ArcGIS, it shouldn’t be a problem. The website will allow you to download the .dll, and you can also watch the video on how to use it. Well worth the time.
So of course that got me thinking: could we make the same map with spatial SQL. Well, sure, and it is super easy. If you want to get spun up on what these 9 color bivariate choropleth maps are, and the theory behind it, have a look at this great site. Josh does a great job explaining how this works, but it is a little cumbersome if you want to pump out map after map. But, with a very little bit of SQL, you can easily pull it off.
Let’s start with our data: I have a Postgres table of United States County boundaries, with the attributes percentobese and percentdiabetes, along with a FIPS code and a geometry column.
To prepare the data for the map, we simply issue this SQL query:
SELECT fips,geom, ntile(3) over (order by percentobese) || '.' || ntile(3) over (order by percentdiabetes) AS bimode INTO qlayer FROM ushealthrisk
yes, that’s it. Really.
I’m not kidding. We’re done, folks. Go home, nothing left to see.
Well, if you insist on reading, I’ll tell you what the SQL does, and how to actually visualize the data.
A friend is doing some research on estrogen and estrodial in water samples in Delaware. We have about 50 sub-watersheds that show the upstream contributing area for each water sample. We’ve done some regression analysis to see if we can find any correlation of estrogen with landscape parameters, like land use.
So far, we haven’t been impressed with the results. Nonetheless, I’m not ready to give up, so I thought:
what if there is a relationship with the number of poultry houses upstream of each sample.
The bad news: we don’t have a layer of poultry houses in Delaware. The good news: I have a bunch of eager students who want to learn GIS. The solution: Let’s use a lecture to teach students about multi-user, simultaneous editing with Postgres and PostGIS, and have them do it.
So that’s what we did. On Wednesday, I took 18 of my Advanced GIS students, and introduced them to an 8 minute video that would step them through digitizing all the poultry houses in the watershed. We had Postgres running on the teaching computer, and QGIS on 18 workstations in the classrooms. Our basic setup looked like this (with 18 QGIS workstations, of course):
I then had to instruct my students how to do the digitizing. Since people catch on at different paces, the best thing was to just use this video:
After 1h 40m, the students had around 1,000 poultry houses digitized! And, as a learning experience, they had the opportunity to see how an enterprise class database could be stood up in a matter of minutes to facilitate mult-user digitizing. It blew their mind to realize that after 2 hours, we had actually put in 36 man hours of digitizing. They had never used QGIS before, and it only took an 8 minute video to spin them up on the project!
This was easy to do, excited the students to be exposed to multi-user editing and open source GIS, and accomplished an important task for a research project that an individual student is working on. Everyone wins!
I hope this inspires you to come up with some creative ways to introduce GIS concepts to your class.
If you want to learn more about how to build an enterprise GIS with open source tools, check out my courses on www.gisadvisor.com
I wanted to revisit the taxi data example that I previously blogged about. I had a 6GB file of 16 million taxi pickup locations and 260 taxi zones. I wanted to determine the number of pickups in each zone, along with the sum of all the fares. Below is a more in-depth review of what was done, but for those of you not wanting to read ahead, here are the result highlights:
Out of memory
ArcGIS Server Big Data GeoAnalytics with Big Data File Share
*I’m happy ArcGIS Pro ran at this speed, but I think it can do better. This is a geodatabase straight out of the box. I think we can fiddle with indexes and even structuring the data to get things to run faster. That is something I’ll work on next week.
Do you have a set of lines that you need to determine if there are any “dangle” nodes? A dangle is a line segment that overhangs another line segment. Now, some dangles are valid, like a pipe that terminates in a cul-de-sac.
A few people have posted about this already, but I figured I would give it a shot as well, as I think my SQL is a little more terse. Anyway, here is the query, and we’ll talk about it line by line:
SELECT DISTINCT g1 AS g
(SELECT g AS g1 FROM
(SELECT g, count(*) AS cnt
(SELECT ST_StartPoint(g) AS g FROM plines
UNION ALL SELECT ST_EndPoint(g) AS g FROM plines ) AS T1
GROUP BY g) AS T2
WHERE cnt = 1) AS T3
WHERE ST_Distance(g1, g) BETWEEN 0.01 AND 2;
I was interested in creating mult-ring buffers but with a twist: I didn’t want the buffers to overlap one another. In other words, if I had concentric buffers with distances of 100, 200, and 300 around a point, I want those buffers to reflect distances of 0-100, 100-200, and 200-300. I don’t want them overlapping one another. You can actually do that with the PostGIS function ST_SymDifference, but there are a few nuances that you have to be aware of.
Unlike some of my longer videos, this one will start out with the answer, and then we’ll walk through all the SQL. You’ll see it isn’t so bad. And, you continue to see that spatial is not special!. It’s only 20 minutes long, but the answer is shown in the first minute.
In the video I’ll slowly walk you through all the spatial SQL to create buffers for the points and trim all the overlaps so that there are no overlapping buffers. You’ll learn some really cool Postgres commands including:
ST_Buffer, ST_DifferenceSym, DISTINCT ON, and SET WITH OIDS.
I found myself amazed that with a few SQL tweaks, we were able to turn ordinary buffers to more useful non-overlapping buffers. I hope you enjoy the video.
I’d like to create more videos like that – please leave so comments below so that I know others want me to continue these kinds of tutorials.
If you want to learn more about SQL, programming, open source GIS, or Manifold GIS, check out courses at www.gisadvisor.com.
Today I want to introduce you to another one of my students, Meghan Murphy. Meghan is an outstanding student, and one of the top undergraduates I have ever worked with (I know, I say that a lot, but they just keep getting better and better). Even as a Sophomore, Meghan was always helping other students out, even the Seniors – students would seem to wait for Meghan to organize everyone together to study for upcoming exams.
She also has an innate ability to work with GIS, and pick up new things: one day she has never programmed in Python, and the next day, she has a couple of hundred line Python script created and running in ArcGIS! So, I was so happy when Meghan said she wanted to take a special course in Open Source GIS that I was offering this semester. We covered QGIS, Postgres/PostGIS, GDAL, and Geoserver. For her final project, Meghan decided she wanted to compare the cartography capabilities of ArcGIS and QGIS, and make a video about it (maybe she was inspired by my videos, or maybe she just figured after watching Lembo’s videos, how could I do worse!).
Whatever her reason, like everything else she does, this turned out great, especially since she had never done a live tutorial like this. So, I encourage you to watch the side-by-side comparisons for creating a basic cartographic product in both ArcGIS and QGIS. It’s about 40 minutes long, but worth every minute: I found that I learned some things I hadn’t known regarding some cartographic tools. And, on that note, I’ll have more videos from my undergraduates shortly (some built web maps, others built an enterprise GIS with Postgres.
If you want to learn more about open source GIS, Python programming, Spatial SQL, or Spatial Statistics, check out my online courses at www.gisadvisor.com.
This is a continuation of Mark Balwanz’s blog posts on his creation of web mapping sites using both ESRI and Geoserver. Today he will talk about his experience creating the site using open source technology.
Over the first three parts of this blog series (here, here, and here), I have laid out my project plan, walked you through my ESRI implementation, and my Open Source implementation. During this fourth and final blog I plan on sharing with you my overall thoughts on both implementations in regards to what I liked and disliked. As I mentioned in Part 1 of this series, everything I share here is just my opinion and is based on this one project. I also want to point out that most of my previous experience, both academic and professional, is based in ESRI and is probably shaping some of my opinions.
As someone who has spent most of his GIS life working with ESRI, I have to admit it was a little uncomfortable to move into the Open Source world. However, being uncomfortable was a good thing as it pushed me to learn a lot of new technologies. I think Open Source can be a great way for organizations to start using GIS as there is a lot less upfront cost and with a little research you can find the Open Source project that best fits your need.
I hope you have enjoyed this blog series and have maybe learned something that you can use. Please feel free to leave comments if you have any questions and thanks for reading!
The Maryland’s Geospatial Conference (#TUgis2017) is on March 20/21, 2017. I first attended TUgis in 1990, and it is always a great conference. It is not too large, so it is great way to have extended time with people. So, if you had a technical question for someone from say ESRI, you could simply stop by their booth and have a chat.
This year I was asked to support the pre-conference workshops. I will be presenting two workshops with the help of my students. If you recall, my students are quite good at instructing others about GIS technology. I’m really looking forward to the conference and interacting with people during the workshop. Keep in mind, this is not something we are just throwing together – we’ve been spending a lot of time thinking about how to effectively move people through the material so that beginners do not get lost, and more technically savvy people are sufficiently challenged. We are fanatical about making sure people’s learning experience is excellent.
A description of the courses are found here:
Spatial SQL: A Language for Geographers: Are you stuck in a rut of only knowing how to use a GIS GUI? Do you want to learn how to automate tasks, but are afraid of computer programming. If so, SQL is the most powerful tool you can learn to help you perform complex GIS tasks. This hands-on course is designed to teach you how SQL can replicate many database and GIS tasks. We will start at a very basic overview and then proceed to more advanced topics related to GIS.
Topics to include:
Spatial is NOT Special
SQL Data Types
Spatial SQL for Vector and Raster Analysis
Spatial SQL for Classic Geographic Analysis
For this class, we’ll be using spatiaLite which is the spatial extension used with SQLite. This is a great way to get started, as it is very similar to the functionality of Postgres/PostGIS. If you want to move to enterprise GIS with Postgres or even Oracle or SQLServer, you’ll be in really good shape.
Python for Geospatial: If you are in the field of GIS, you’ve probably heard everyone talking about Python, whether it’s Arcpy in ArcGIS or special Python packages for doing things in open source. In this hands-on workshop you will learn how Python is used to perform GIS analysis. The workshop will be an introduction to Python, with emphasis on integrating multiple Python plug-ins with ArcGIS and open source GIS.
Topics to include:
An overview of Python (variables, statements, I/O, writing code)
Python plug-ins for Geospatial (numpy, geocoder, pygal, Postgres)
A Taste of Arcpy
A Data Analytics Project with Python (for this, we will geocode addresses using Python, perform analysis with open source GIS, take the results into Arcpy to do more GIS analysis, compute statistical results with Python calling Excel, and then create charts and graphs of the results for use on the Internet—without ever opening up a single GIS product.)
The semester was going so well, and the students were so responsive to anything I asked, I said what the heck, let’s try something crazy. So, I showed the students how to use two Python geocoding packages (geocoder and censusgeocode) and then said:
why don’t we conduct a research project over the next week to test the match rates and positional accuracies of the Google API and the United Census Bureau API.
So yeah, I gave them a week to put this together: design, code, analyze, and write. And, like most of my students at this level, they didn’t disappoint me. This meant they had to integrate a lot of what they have learned over the years (programming, GIS, statistics, etc.).
I was surprised by how little there is out there in terms of quantitative assessment of geocoding accuracies. I hope you have a chance to click on the link and check out the working paper (we will submit it to a journal sometime soon). Also, I included a short abstract below so that you can see the results of our work (note: our paper includes the original data and the source code for performing the geocoding):
Undergraduate Research in Action: Evaluating the positional differences between the Google Maps and the United States Census Bureau geocoding APIs
Abstract: As part of a class assignment in GIS Programming at Salisbury University, students evaluated 106 geographically known addresses to determine the match rate and positional accuracy obtained using the Google and the United States Census Bureau geocoding application programming interface (API)s. The results showed that 96.2% of the addresses supplied by the Google API were successfully geocoded, while 84% of the addresses supplied by the Census Bureau API were successfully geocoded. Further, the Google API matched 90% of the addresses with a ROOFTOP designation. The average positional accuracy of the Google derived addresses were 80m overall, and 65m for those geocoded with the ROOFTOP designation while the Census Bureau positional accuracy was 271.09m.
So yeah, this is what you can do with 7 GIS undergraduates at Salisbury University: they work hard, fast, and are a very creative bunch.