My friend is attempting to quantify the area of different landuse values for different areas that are upstream from her sample points. This means she needs sample points, landuse, and upstream areas (i.e. sub-watersheds). The problem is, her watersheds overlap, the buffer distances around the sample points overlap themselves AND the watersheds, and she then needs to summarize the results. It’s actually a tricky problem due to the overlaps: GIS software doesn’t really like when features within a single layer overlap one another. Also, if a buffer for a sample point overlaps two different watersheds, that becomes tricky too.
Sure you can solve it with a few for loops, inserting the results into a new table, but that really is a hassle. Also, I have to do it for different distances and different land cover types.
So, I once again turned to SQL – remember what I keep telling you – spatial is not special. It’s just another data type. This video steps you through performing a multi-ring buffer on overlapping objects from 3 different layers: sample points, watersheds, and land use. As we step through the SQL, you’ll see how easy it is to put the query together. And, at the end, you’ll see how flexible the query is should you want to change your objectives. And, for good measure, we’ll throw in a little bit of parallel processing.
When you were in Statistics 101, and the Professor said ok, we are now going to learn about the Central Limit Theorem, did you tune out? Did you sarcastically say when is someone going to grab me and order me to tell them about the Central Limit Theorem? Come on, admit it, you did. Well, so did I – I was 18 years old, and couldn’t care less.
Well, you know what? Understanding the Central Limit Theorem has really big implications for big data analytics. Check out this 20 minute video, and you’ll see that by applying the Central Limit Theorem and some statistical theory, you can approximate the results of an expensive multi-server implementation for interrogating really large databases.
I’ll show you how you can obtain very precise estimates on really large databases by simply applying some basic statistics you should have learned Freshman year (but you were too busy partying, weren’t you?)
stay tuned, I’ll be coming out with a big data analytics class in the New Year. If you want to learn more about SQL, programming, open source GIS, or Manifold, check out courses at www.gisadvisor.com.
Continuing my series on big data geoanalytics, I wanted to show how to bring in large data sets so that we can start working with them. The data set we’ll use is the NYC taxi data that includes information on pickup and dropoffs. There are about 13 million records in a 2.2GB .csv file. That is not insanely large, but it is large enough for us to start messing around with it (don’t worry, I have a few 20GB+ data sets that I am working with and will eventually show that to you as well).
This video below will walk you through the steps I took to load and prepare the NYC taxi data inside of Manifold Future. My next posts will begin to look at how we can begin interrogating the data source to find meaningful information.
I hope you enjoy the video. Please comment below – I’d love to hear what people think.
In my last video, I gave a short of mile-high view of how SQL can be used for big data geoanalytics. I want to dive a little deeper, and explore the idea of create linear features from a time-series of points.
Once again, using some basic SQL and spatial SQL, we can perform basic time-series analysis.
I’m enjoying making these videos, as they are helping me put my course on big data and GIS together. I hope you like them too. Please comment down below so that I know this is something the user community enjoys and is learning from.
Also, if you are interested in learning more about how to perform spatial SQL in Microsoft SQL Server, Postgres, or Manifold, visit my other site, www.gisadvisor.com to sign up for my online video courses.
I’m getting ready to create a course in big data analytics with GIS. I have lots of ideas as to what to do, but one thing I know is that I will be using spatial databases and SQL. I’ll also be using Manifold Future.
ESRI has recently introduced their ArcGIS GeoAnalytics Server, which will introduce many GIS professionals to big data analytics with GIS. They have some interesting scenarios and example data using NYC taxi cabs. I think these will be really good case studies.
This video (just shy of 20 minutes) will use SQL and Manifold to try and address these big data problems.
Keep an eye on my blog as I will be rolling out new ideas as I prepare my course for the Spring.
if you like the video, and want to learn more about how to improve your spatial database skills, check out my videos at www.gisadvisor.com.