From days to seconds: experiences with parallel processing and GIS (Part I, the team)

Many of you know that I have been working with parallel processing for GIS in the form of video card general purpose, graphical processing units (GPGPU).  However, this year I decided to change things up a bit, and focus more on CPU based parallel processing.  To that end we began working with Hadoop along with spatial Hadoop.

I plan to have around 7 or 8 blog posts on this over the next 4 or 5 weeks.  My initial outline is:

Part I, the team (this post)
Part II, the point-in-polygon problem
Part III, solving the problem in hours, not days
Part IV, solving the problem in minutes, not hours
Part V, solving the problem in seconds, not minutes
Part VI, lessons learned and challenge to the GIS community
Part VII, advice on building your own server

In the posts, I will tell you what we did, how we did it, and we will also assemble our code.

So, spoiler alert:  Yes, we actually took a classic GIS process that required days to complete, made some adjustments to complete the process in hours, created our own cluster of 4 computers with 16 CPUs to complete the problem in minutes using parallel processing, and finally, went all-out, and rented time on the Amazon EC2 server to complete the job in the realm of seconds (BTW, the rental time on EC2 cost around $5.00 to complete the job).

But first, I want to introduce you to the undergraduates that I worked with on this project with me.

My Research Interns

This summer, as part of a National Science Foundation (NSF) Research Experience for Undergraduates (REU), I spent my summer with two very smart guys: Alan Young, and Robbie Stancil.

Robbie Stancil

image2A Junior Math and Computer Science major, Robbie is an extraordinary student.  If there was something you could do in college, Robbie did it.  Just some of the things Robbie has done in 3 years at Salisbury University include being part of the Bellevance Honors Program, a Presidential Citizens Scholar, a Bellevance Honors Student Ambassador, Resident Assistant (RA) for the Honors Living Learning Community, Salisbury University student member of the Alumni Association, and the 2013-2014 Manokin Hall Residents Council Treasurer.  Outside of academics, Robbie has also held internships at NASA Wallops Flight Facility, and ADNET Systems.

As you might imagine, Robbie’s name has become a fixture on the Dean’s list at Salisbury University.   He sort of reminds me of when I was a student, with the only difference being my writeup would have said, Art Lembo was a mediocre college student, drank beer his freshman year, and was not smart enough to get a scholarship in college.  Other than that, we’re practically identical.

Alan Young

image1Alan Young comes to us from Berry College in Georgia, where he is a Sophomore Math and Computer Science major.  Alan holds multiple scholarships at Berry, including the Berry Academic scholarship, and the Georgia Zell Miller scholarship.  He has a job in the IT Department at Berry College, and is also involved in a number of organizations on campus including the math club, computer science club, and the Baptist collegiate ministries.  Alan has participated in a number of computer science programming competitions.  It is hard to believe he has only completed his Sophomore year in college!

Both of these guys are exceptionally smart and hard working students.  I loved working with them.  While I am thrilled that Robbie will be here next semester as a student, I am so sad to say goodbye to Alan as he returns to Berry College.  I can’t wait to start writing recommendation letters for these two in the next couple of years.

My next post will go over the problem we faced, the data we used, and the roadmap we set for ourselves. Stay tuned.

4 thoughts on “From days to seconds: experiences with parallel processing and GIS (Part I, the team)

  1. I am wicked excited for the next edition of this. Are you doing pure math and computer science algorithms or are you building off of a library that has already been created? Which language will you be working with? Will the code be available on GitHub?

  2. Pingback: Parallel Processing with QGIS | gisadvising

  3. Pingback: Undergraduate Geospatial Python Projects | gisadvising

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s