Big Data Results

I wanted to revisit the taxi data example that I previously blogged about.  I had a 6GB file of 16 million taxi pickup locations and 260 taxi zones.  I wanted to determine the number of pickups in each zone, along with the sum of all the fares.  Below is a more in-depth review of what was done, but for those of you not wanting to read ahead, here are the result highlights:

Platform Command Time
ArcGIS 10.4 AddJoinManagement Out of memory
ArcGIS Pro Summarize Within 1h 27m*
ArcGIS Server Big Data GeoAnalytics with Big Data File Share Summarize Within

Aggregate Points

~2m
Manifold 9 GeomOverlayContained 3m 27s
Postgres/PostGIS ST_Contains 10m 30s
Postgres/PostGIS (optimized) ST_Contains 1m 40s
*I’m happy ArcGIS Pro ran at this speed, but I think it can do better.  This is a geodatabase straight out of the box. I think we can fiddle with indexes and even structuring the data to get things to run faster.  That is something I’ll work on next week.

Continue reading

Finding “Dangles” with PostGIS

Do you have a set of lines that you need to determine if there are any “dangle” nodes?  A dangle is a line segment that overhangs another line segment.  Now, some dangles are valid, like a pipe that terminates in a cul-de-sac.

A few people have posted about this already, but I figured I would give it a shot as well, as I think my SQL is a little more terse.  Anyway, here is the query, and we’ll talk about it line by line:

SELECT DISTINCT g1 ASINTO dangles
FROM plines, 
    (SELECT g AS g1 FROM  
         (SELECT g, count(*) AS cnt  
          FROM  
              (SELECT  ST_StartPoint(g) AS g FROM plines
               UNION ALL
               SELECT  ST_EndPoint(g) AS g FROM plines ) AS T1 
         GROUP BY g) AS T2
     WHERE cnt = 1) AS T3
WHERE ST_Distance(g1, g) BETWEEN 0.01 AND 2;

Continue reading

Multi-Ring (non-overlapping) Buffers with PostGIS

I was interested in creating mult-ring buffers but with a twist: I didn’t want the buffers to overlap one another.  In other words, if I had concentric buffers with distances of 100, 200, and 300 around a point, I want those buffers to reflect distances of 0-100, 100-200, and 200-300.  I don’t want them overlapping one another.  You can actually do that with the PostGIS function ST_SymDifference, but there are a few nuances that you have to be aware of.

Unlike some of my longer videos, this one will start out with the answer, and then we’ll walk through all the SQL.  You’ll see it isn’t so bad.  And, you continue to see that spatial is not special!.  It’s only 20 minutes long, but the answer is shown in the first minute.

In the video I’ll slowly walk you through all the spatial SQL to create buffers for the points and trim all the overlaps so that there are no overlapping buffers.  You’ll learn some really cool Postgres commands  including:

 ST_BufferST_DifferenceSymDISTINCT ON, and SET WITH OIDS.

I found myself amazed that with a few SQL tweaks, we were able to turn ordinary buffers to more useful non-overlapping buffers.  I hope you enjoy the video.

I’d like to create more videos like that – please leave so comments below so that I know others want me to continue these kinds of tutorials.

 If you want to learn more about SQL, programming, open source GIS, or Manifold GIS, check out courses at www.gisadvisor.com.  

GIS Analysis of Overlapping Layers

overlayoverlapMy friend is attempting to quantify the area of different landuse values for different areas that are upstream from her sample points.  This means she needs sample points, landuse, and upstream areas (i.e. sub-watersheds).  The problem is, her watersheds overlap, the buffer distances around the sample points overlap themselves AND the watersheds, and she then needs to summarize the results.  It’s actually a tricky problem due to the overlaps: GIS software doesn’t really like when features within a single layer overlap one another.  Also, if a buffer for a sample point overlaps two different watersheds, that becomes tricky too.

Sure you can solve it with a few for loops,  inserting the results into a new table, but that really is a hassle.  Also, I have to do it for different distances and different land cover types.

So, I once again turned to SQL – remember what I keep telling you – spatial is not special.  It’s just another data type.  This video steps you through performing a multi-ring buffer on overlapping objects from 3 different layers: sample points, watersheds, and land use.  As we step through the SQL, you’ll see how easy it is to put the query together.  And, at the end, you’ll see how flexible the query is should you want to change your objectives.  And, for good measure, we’ll throw in a little bit of parallel processing.

Big Data GeoAnalytics – Turning Points to Lines

In my last video, I gave a short of mile-high view of how SQL can be used for big data geoanalytics.  I want to dive a little deeper, and explore the idea of create linear features from a time-series of points.

Once again, using some basic SQL and spatial SQL, we can perform basic time-series analysis.

I’m enjoying making these videos, as they are helping me put my course on big data and GIS together.  I hope you like them too.  Please comment down below so that I know this is something the user community enjoys and is learning from.

Also, if you are interested in learning more about how to perform spatial SQL in Microsoft SQL Server, Postgres, or Manifold, visit my other site, www.gisadvisor.com to sign up for my online video courses.

Big data geo-analytics with SQL

I’m getting ready to create a course in big data analytics with GIS.  I have lots of ideas as to what to do, but one thing I know is that I will be using spatial databases and SQL.  I’ll also be using Manifold Future.

ESRI has recently introduced their ArcGIS GeoAnalytics Server, which will introduce many GIS professionals to big data analytics with GIS.  They have some interesting scenarios and example data using NYC taxi cabs.  I think these will be really good case studies.

This video (just shy of 20 minutes) will use SQL and Manifold to try and address these big data problems.

Keep an eye on my blog as I will be rolling out new ideas as I prepare my course for the Spring.

if you like the video, and want to learn more about how to improve your spatial database skills, check out my videos at www.gisadvisor.com.

Maryland State GIS Conference (TuGIS)

The TuGIS training workshop on March 20, 2017 is completed – you can see the workshop evaluations below:  

The workshop evaluations are in

(if you want to cut to the chase, the workshop results are here).

I had a great time teaching our two workshops at the TuGIS conference.  In the morning, my students and I presented Spatial SQL: A Language for Geographers, and in the afternoon we taught Python for Geospatial.

We knew expectations would be high: both courses sold-out in 2 days, and we even expanded the class size to 38 people for each workshop!!  I knew that teaching 38 people would be a challenge, but it would also be a great lesson to see if we could corral so many cats into a single, technical workshop.  The workshop evaluations would be crucial to determine if we met our objectives.

The workshop evaluations were overwhelmingly positive.  For example:

  1. over 90% said they enjoyed the workshop.
  2. over 83% said it was much better than other GIS training they have been to.
  3. on a scale of 1-10, 95% of the attendees rated the course a 7 or above.
  4. 93% said they learned something new in the workshop.
  5. 89% said the workshops would help them in their careers.
  6. 91% said they would apply these skills to their job.

I decided to throw one curve-ball on the evaluation sheet and asked:

This was a half-day workshop. Most one-day GIS training classes cost around $600/day. If we developed other in-depth full-day workshops on topics like this for under $250, how likely would you be to participate in it?

it turned out that 89% of the respondents rated a 7 or higher, indicating that almost 90% of the people valued the training enough to pay $250 for a full day course (opposed to $600 for most GIS courses).  This means it is possible to offer really good, low cost training to GIS professionals.  Keep an eye out on this, as I am very likely to take these training classes on the road.

The comments the participants provided were great – it confirmed our belief that this was an excellent training course, and that the course needed to be expanded to 8 hours, rather than 4 hours – most everyone felt like their was simply too much information to absorb.

If you would like to see the results of the workshop evaluation, click the link below:

TuGIS Workshops – Google Forms

Finally, if you can’t make it to a live workshop, all of my video training courses are $30 or less, when you visit www.gisadvisor.com.  These courses can’t get into the level of depth that a live course gives, but you’ll see that after thousands of students taking the courses, close to 90% of them give the course 4 starts out of 5!

 

Undergraduate Geospatial Python Projects

earthballThis week my GIS Programming students presented their programming projects to ESRI. First, I cannot say enough to thank ESRI for taking time out of their schedule to meet with our students – the staff was helpful, encouraging, and provided great feedback to the students – what an honor it was to get their feedback.  I am so thankful to be a part of a GIS community that is so supportive of one another.

Now, this was a really special class of undergraduates – and some of them were part of that special group of students that presented their research at an undergraduate conference.  It was small, so we could do some really cool things.  In fact, in the middle of the semester, the students wrote a paper comparing the geocoding accuracies of Google Maps and the United States Census Bureau.

Things were going so well that I decided in lieu of a final exam, we expanded their final projects a little more, and arranged for the staff at ESRI Charlotte and ESRI Redlands to join us on a WebEx that included demonstrations and a code walk-through.  Below are each students’ presentation, and some of the Q&A from ESRI:

noahNoah Krach.  Noah is an amazing undergraduate.  Recently, we lost one of our graduate research associates, and Noah stepped in to provide technical support on a National Science Foundation project in Lake Victoria.  Without missing a beat, Noah was all over the project, and he used his time in my class to create an Arcpy tool to extract, translate, and load (ETL) gigabytes of Landsat imagery.  This tool does a lot, and I can’t even begin to describe all he did, you’ll simply have to watch and learn.

Check out his video, and you’ll see why we are so excited that Noah will be around for another semester.

cc

Caitlin Curry.  If you follow my blog, you’ve already met Caitlin.  She finished her summer internship I told you about, and during the middle of it, her boss wrote us to say what an excellent worker she was (he prefaced his email by saying he never does that, but was so impressed with Caitlin, he had to let us know).  We are impressed with Caitlin, too.  And, as I have now grown to expect, Caitlin did an amazing job with another ETL type tool using Arcpy, where she downloaded, unzipped, and processed earthquake data and critical infrastructure.

I did a lot of emergency response work with earthquakes in a previous life, and what Caitlin did here would have been so useful.  I think you will enjoy seeing how she integrated many different Python packages with Arcpy to provide an early warning application for emergency responders.  And just as a heads-up, Caitlin uses Python to download everything while the script is running – so you just give the script to a user and it works without any operator knowledge of the underlying data = really cool, and efficient.

mb

Matthew Bucklew.  After my first lecture this semester, Matt told me he built his own computer this summer – just for fun.  So, I knew he wasn’t your ordinary  geographer – he likes to try new things, and if something is done in a conventional way, Matt is going to try and be more innovative.  Matt created a great Arcpy application to locate renewable engery stations needed by automobiles.  His Python scripts use ArcGIS for analysis, but at the same time, seamlessly brings in the Google APIs to provide directions to the nearest locations.  For good measure, he also brings in other packages like heapq.

At the moment, Matt’s program works on a desktop, but his hope it to turn this application into a cloud based solution for use with mobile phones.  Keep an eye out for what Matt comes up with, and if you watch this, you’ll see it is an excellent tutorial on how to mash up bunches of Python packages with Arcpy.

jmJessica Molnar.  Like Caitlin, Jessica is another student you’ve seen before.  She’s got such a big heart, and is always looking for ways to apply GIS to humanitarian and ecological solutions.  In this project, Jessica created an Arcpy application to identify locations for community gardens in Baltimore City with special consideration for locations within food deserts, near churches and schools, and on suitable soils for growing food.  Jessica’s program also found those locations that were already owned by the City, but were vacant.  Let’s hope the City makes use of this to build a more beautiful Baltimore (BTW, Jessica wrote her program to work in any location in the State of Maryland, so any community can use this tool!).  I think Jessica may eventually roll this into a cloud based solution – hey Jessica, I think we found a project for graduate school!

 

jtJohn Tilghman.  John’s family owns an orthodontist practice, and John decided to use PostGRES/PostGIS along with a number of other different Python packages to perform market area analysis.  John integrated PostGRES, Google, and the Pygal libraries to create the first stages of a geodashboard to assess the effectiveness of marketing strategies, and other metrics.  In the video, you’ll also see how he created a distance decay algorithm in SQL to determine at what point customers drop off from visiting the practice.  With just a little bit of information (addresses and marketing strategies), John was able to extract a ton of business information – in fact, our guests from ESRI were surprised the John wasn’t already a business major!

This is an excellent presentation to watch for those of you who are interested in using Python with Open Source GIS – you’ll learn how to integrate FOSS4g and Python for a business analytics tool.

 

jyJosh Young.  Josh created an Arcpy script to assemble tons of location based data that might be useful for someone thinking about moving to a particular location.  Now, in Josh’s case, he chose location based data he deemed important for the neighborhood (download speeds, elementary school, crime statistics, distance to the downtown, etc.).  But ultimately, what Josh has shown us is how to create a template that integrates multiple Python packages and online data to provide very useful information.

It would be so easy to take Josh’s work and roll it into a site specific location-based analysis engine.  In fact, one of the people watching Josh’s presentation mentioned that he was moving, and saw how useful this could be for a community.  The best part of it is that Josh did it with all freely available online data for the State of Maryland, so any community can spin this up into a cloud-based solution.

 

image2

Robbie Stancil.  Robbie is our only non-geography major.  You’ve met him before when he worked with me on a National Science Foundation project to use Spatial Hadoop.  Like John, Robbie’s project used Postgres/PostGIS and the Google API to do something quite interesting: he created a mesh of points over community to determine how far the Google API will search in order to find a property address, and compared the concave hull of each series of points for an address to the actual property parcel.  This project got us thinking about some very creative uses – you’ll have to watch it until the end to see the interesting things we came up with.

 

Again, I have to give a huge shout out to the ESRI staff – they were wonderful guests, and really excellent mentors during the Q&A. As these students get ready to graduate in May, I know they will make excellent employees or graduate students – the future is really bright for them. If you are in academia, I hope that you are inspired to expect the very best of your students as I do, and you’ll be so pleased to see what they are capable of doing.

want to learn how to program geospatial solutions like these students? Check out the geospatial courses at gisadvisor.com.   

A Typical Class Project at Salisbury University: Evaluating Geocoding Accuracies

I’ve always been proud of our Salisbury University GIS students, and love to push them as far as their little minds can handle it.  You may recall that last Spring I had my Advanced GIS students perform independent GIS projects and present those projects as posters at an Undergraduate Research Conference.  Well, this Fall I am teaching GIS Programming, and have 7 awesome students (pictures and bios to follow).  We started the year off learning spatial SQL with Postgres and PostGIS.  We have now moved into Python, which includes Arcpy as well as other Python packages.

The semester was going so well, and the students were so responsive to anything I asked, I said what the heck, let’s try something crazy.  So, I showed the students how to use two Python geocoding packages (geocoder and censusgeocode) and then said:

why don’t we conduct a research project over the next week to test the match rates and positional accuracies of the Google API and the United Census Bureau API.  

So yeah, I gave them a week to put this together: design, code, analyze, and write.  And, like most of my students at this level, they didn’t disappoint me.  This meant they had to integrate a lot of what they have learned over the years (programming, GIS, statistics, etc.).

I just uploaded their work onto researchgate:

 Click for ResearchGate Article  

I was surprised by how little there is out there in terms of quantitative assessment of geocoding accuracies.  I hope you have a chance to click on the link and check out the working paper (we will submit it to a journal sometime soon).  Also, I included a short abstract below so that you can see the results of our work (note: our paper includes the original data and the source code for performing the geocoding):

Undergraduate Research in Action: Evaluating the positional differences between the Google Maps and the United States Census Bureau geocoding APIs

Abstract:  As part of a class assignment in GIS Programming at Salisbury University, students evaluated 106 geographically known addresses to determine the match rate and positional accuracy obtained using the Google and the United States Census Bureau geocoding application programming interface (API)s. The results showed that 96.2% of the addresses supplied by the Google API were successfully geocoded, while 84% of the addresses supplied by the Census Bureau API were successfully geocoded.  Further, the Google API matched 90% of the addresses with a ROOFTOP designation.  The average positional accuracy of the Google derived addresses were 80m overall, and 65m for those geocoded with the ROOFTOP designation while the Census Bureau positional accuracy was 271.09m.  

So yeah, this is what you can do with 7 GIS undergraduates at Salisbury University: they work hard, fast, and are a very creative bunch.

paper

 

 

Open Source GIS Course at UMD – the Syllabus

UMD provided me with enrollment numbers, and we are getting a pretty full class.  However, a few people on the “outside” have asked for a little more depth into what will be covered.  So, I put together a quick outline on each of our class meetings.

Keep in mind, in addition to the class meetings, there will be weekly discussions and short assignments.  Each class session is 2.5 hours long.  Those sessions colored orange are classes that I will be on the campus live (please note that this is tentative, and the dates may slide around a bit), giving the lecture and assisting students in the lab.  The other class sessions are a combination of online video lectures or in laboratory exercises.  For those with full time jobs, most of the material can be performed at home on you own devices, although being in the lab with the TA will be helpful.

Again, please feel free to email  geog-gis@umd.edu to get more information about signing up for the class. Continue reading