About Art Lembo

I am a Professor at Salisbury University where I teach courses in quantitative geography and GIS.

Big data analytics with GIS – the CSULB pilot

Well folks, it’s happening. I’m about to take one of my most adventurist steps into these training classes yet.

With the release of Manifold 9, I’m going to offer a big data analytics class that includes gigabytes of data, multi-databases, statistical processing, and parallel processing. And, it is something you will be able to participate in using only freely available software. Imagine that, a big data analytics class with free software

Delivering 20GB of data at a bring your own device (BYOD) training class is a challenge. Also, with this high level work, it is a further challenge to decide what can fit into a one day workshop.

Thankfully, the California State University at Long Beach provided me with an opportunity to teach my workshop to their students this week. It was a blast!

More importantly I learned a lot about how to put together a deep-dive of a class like this together. 8 hours is simply too short!!

The students loved the workshop, and I loved teaching it. Stay tuned, as a live workshop will be up coming to a city near you, and an abbreviated online workshop will roll out in the next month.

csuposter

 

 

 

Work smarter – not larger

When you were in Statistics 101, and the Professor said ok, we are now going to learn about the Central Limit Theorem, did you tune out? Did you sarcastically say when is someone going to grab me and order me to tell them about the Central Limit Theorem? Come on, admit it, you did.  Well, so did I – I was 18 years old, and couldn’t care less.

Well, you know what? Understanding the Central Limit Theorem has really big implications for big data analytics. Check out this 20 minute video, and you’ll see that by applying the Central Limit Theorem and some statistical theory, you can approximate the results of an expensive multi-server implementation for interrogating really large databases.

I’ll show you how you can obtain very precise estimates on really large databases by simply applying some basic statistics you should have learned Freshman year (but you were too busy partying, weren’t you?)

 

stay tuned, I’ll be coming out with a big data analytics class in the New Year.  If you want to learn more about SQL, programming, open source GIS, or Manifold, check out courses at www.gisadvisor.com.  

Salisbury Gives Back – 4 new GIS tools

My students had a great time presenting the tools they created in my GIS Programming class.  I want to thank our friends from Esri for joining us via GoTo Meeting.   Their feedback and advice was greatly appreciated.

Not only does this add new tools to fill a specific niche with ArcGIS, it also shows that Arcpy is simple enough quickly create these tools.  Further, it shows just how good the Salisbury University Undergraduate Geography students are.  Click the links below, and you’ll see the presentations (we are currently putting the code together so people can download it).  Stay tuned to see that link.

Join Count Analysis: This tool focuses on the spatial autocorrelation method of join count analysis that evaluates area features with binary variables to determine if the data is random, dispersed or clustered.  The tool calculates the expected and observed dissimilar joins, the Z score, and associated p-value.  The presenters are Cody Garcia, Kyle Lane, and Alex Nowak.

 

Quadrat Analysis: This tool focuses on the spatial autocorrelation of point patterns across a landscape. Using points and a grid of quadrats, this tool measures whether a point pattern is random, dispersed, or clustered.  The tool calculates the variance to mean ratio, Chi-square value, and associated p-value.  In addition, the tool thematically shades the quadrats based on their counts.  The presenters are Jeremy Gencavage, Grant Chalfin, and Zach Radziewicz.

 

Stratified Sampling Tool – This is not a spatial tool, but rather a tool that takes stratified samples and generates point and interval estimates from the stratums.  Generating estimates from stratified sampling is a very powerful statistical technique, and provides significant improvements over simple random sampling.  The presenters are Bryan VanGiesen and Brian Hiller.

 

ArcGIS and Google Route Optimization Tool –  This tool integrates ArcGIS with the Google Maps API so that users can route the ArcGIS features over a Google network, using the Google Routing Engine.  Users can input their own geodatabase feature classes, and the tool returns the Google routes as a geodatabase feature class, along with driving directions.  The presenters are Meghan Murphy, Thomas Simpson, and Liam Doherty.

Salisbury Gives Back – free GIS tools

I want to invite you to join the Salisbury University Geography students as they present their final GIS programming projects, Monday December 18, at 4:30PM in Henson 153.  Snacks will be provided*.

Last year we had a really successful presentation of student programming assignments as part of our GIS Programming class.  For this year, we decided to do something different.  Rather than students selecting their own projects, I identified four topics that I thought were under served as traditional GIS tools.  That is, the tools simply don’t exist as far as I know.  And, because the tools aren’t readily available, they aren’t applied within our discipline – hence the name, Salisbury Gives Back.

So, our students created four separate ArcGIS Script tools that will run right in ArcGIS:

Join Count Analysis Tool – This tool focuses on the spatial autocorrelation method of join count analysis that evaluates area features with binary variables to determine if the data is random, dispersed or clustered.  An example would be to determine the spatial autocorrelation of voting patterns for the US Presidential Elections (i.e. red vs. blue states).  The tool calculates the expected and observed dissimilar joins, the Z score, and associated p-value.

Sadly, this useful method is rarely used because a tool does not currently exist to perform join count analysis.  However, with this tool, geographers can now evaluate spatial autocorrelation with binary variables.  In fact, with this tool’s ease-of-use, I expect to see Political Geographers make use of the analysis capabilities to evaluate elections all throughout the US and beyond.

Quadrat Analysis Tool – This tool focuses on the spatial autocorrelation of point patterns across a landscape. Using points and a grid of quadrats, this tool measures whether a point pattern is random, dispersed, or clustered and is frequently used in hazard analysis for things like wildfire distribution, tornado touchdowns, or dispersion of crime.   The tool calculates the variance to mean ratio, Chi-square value, and associated p-value.  In addition, the tool thematically shades the quadrats based on their counts.

Similar to the Join Count Analysis, there does not appear to be a tool that exists to accomplish the task, and therefore the approach is rarely used in our discipline.

Stratified Sampling Tool – This is not a spatial tool, but rather a tool that takes stratified samples and generates point and interval estimates from the stratums.  Generating estimates from stratified sampling is a very powerful statistical technique, and provides significant improvements over simple random sampling.  A good example is to estimate yearly household utility usage in a community by sampling homes from three different stratums: large, medium, and small.  The tool allows the user to enter the data for each stratum, along with the level of confidence (i.e. 90%, 95%), and provides the point estimate and confidence interval.

Stratified sampling is rarely used because there isn’t a tool that can perform the task.  However, with this tool, geographers can now easily determine confidence intervals for estimation of averages, totals, and proportions when considering different stratums.

ArcGIS and Google Route Optimization Tool – Currently, network analysis in ArcGIS requires Network Analyst. Network Analyst requires a significant effort to not only create a network, but also maintain it.  Further, most organizations don’t have up-to-date speed limits or real time traffic observations.  This tool integrates ArcGIS with the Google Maps API so that users can route the ArcGIS features over a Google network, using the Google Routing Engine.  Users can input their own geodatabase feature classes, and the tool returns the Google routes as a geodatabase feature class, along with driving directions.  In addition, the user can select different route types (i.e. driving, walking, mass transit).  If you ever wanted to route your ArcGIS data over a Google network, you want to stick around for this presentation!

We hope that you can join us for the presentations and code walk-through.  If you know of other people who might be interested in these tools, please do not hesitate to pass this email along.

* we plan to record all four sessions, and I will post them here next week, along with the associated toolbox (assuming I don’t mess up the recording!).  

 

Big Data GeoAnalytics – adding data

Continuing my series on big data geoanalytics, I wanted to show how to bring in large data sets so that we can start working with them. The data set we’ll use is the NYC taxi data that includes information on pickup and dropoffs. There are about 13 million records in a 2.2GB .csv file. That is not insanely large, but it is large enough for us to start messing around with it (don’t worry, I have a few 20GB+ data sets that I am working with and will eventually show that to you as well).

This video below will walk you through the steps I took to load and prepare the NYC taxi data inside of Manifold Future. My next posts will begin to look at how we can begin interrogating the data source to find meaningful information.

I hope you enjoy the video. Please comment below – I’d love to hear what people think.

 

Big Data GeoAnalytics – Turning Points to Lines

In my last video, I gave a short of mile-high view of how SQL can be used for big data geoanalytics.  I want to dive a little deeper, and explore the idea of create linear features from a time-series of points.

Once again, using some basic SQL and spatial SQL, we can perform basic time-series analysis.

I’m enjoying making these videos, as they are helping me put my course on big data and GIS together.  I hope you like them too.  Please comment down below so that I know this is something the user community enjoys and is learning from.

Also, if you are interested in learning more about how to perform spatial SQL in Microsoft SQL Server, Postgres, or Manifold, visit my other site, www.gisadvisor.com to sign up for my online video courses.

Big data geo-analytics with SQL

I’m getting ready to create a course in big data analytics with GIS.  I have lots of ideas as to what to do, but one thing I know is that I will be using spatial databases and SQL.  I’ll also be using Manifold Future.

ESRI has recently introduced their ArcGIS GeoAnalytics Server, which will introduce many GIS professionals to big data analytics with GIS.  They have some interesting scenarios and example data using NYC taxi cabs.  I think these will be really good case studies.

This video (just shy of 20 minutes) will use SQL and Manifold to try and address these big data problems.

Keep an eye on my blog as I will be rolling out new ideas as I prepare my course for the Spring.

if you like the video, and want to learn more about how to improve your spatial database skills, check out my videos at www.gisadvisor.com.

Great work by my undergraduates, Again!

You’ve heard about how good my undergraduate GIS students are here, here, and here.  Oh yeah, and here and here.    Well, over the last few years my undergraduate students have been working with the campus Department of Horticulture by surveying the Salisbury University campus under the direction of Dr. Dan Harris.  Did you know our beautiful campus is a registered arboretum?  Starting with my student Waverly Thompson two years ago, they surveyed all the trees, sidewalks, sprinkler heads, light poles, pretty much anything you can think of with survey grade instruments.

This past summer Zack Radziewicz, Josh Young, and Lindsey Pinder turned that survey work into a beautiful cartographic product, and yesterday put the map online here.  Zach has an art background (he’s also an awesome GIS student) so that really helped, and Josh has been helping me lead professional workshops in Postgres and Python.  Lindsey is a rising GIS star in our Department, so keep an eye out for more posts about her.

It never ceases to amaze me how good our undergraduate students are here at SU.  I’m proud of the work they do, and so thankful to get to work with them each day.

Great job Waverly, Thompson Zack, Josh, and Lindsey.

P.S. While I was in Korea and Dr. Harris was in Brazil, on their own, these students turned the entire map into a 3D visualization – I’ll post that soon.

New poll: Help me pick my next workshop.

I have had a great time giving live workshops – each one has been sold out, the the reviews have been fantastic (see here and here)

As people have asked me to do more advanced GIS workshops, I thought I’d ask some of you who read my blog what you would be interested in  As a side note, I am starting to put together a two-day Big Data Analytics for GIS workshop.  I will definitely offer that as a class at my University,  but I think I can also boil it down to a 16 hour, two day workshop – does that sound interesting to people?

Anyway, check out the poll, add some comments, and let me know if your community would like me out to give a workshop.

Don’t forget, if you want to learn advanced GIS training on your own time, you can grab one of my video courses here