Once again, I am continuing my role as a mentor in a National Science Foundation (NSF) Research Experience for Undergraduate program. This year we’ve decided to build a QGIS plug-in for terrain analysis, as it is embarrassingly parallel (slope, aspect, etc.). We are doing four things as we generate slope for different size digital elevations models:
- A pure python implementation (for an easy plug-in)
- A serial-based C++ implementation (as a baseline for a non-parallel solution)
- A pyCUDA implementation (using the GPU for parallel processing)
- A C++ based parallel solution using the GPU
We plan to put our results on a shared GitHub site (we are in the process of cleaning up the code) so that people can start experimenting with it, and use our example to begin generating more parallel solutions for QGIS (or GDAL for that matter).
Here are some early results:
As you can see, Python is seriously slow. But, pyCUDA is surprisingly fast. You will also notice that early in our process, our serial C++ algorithm is still faster than pyCUDA, as is native QGIS. Now, a big issue is the I/O, and over the next couple of weeks, we’ll be working on that. Also, we are working on better ways of getting the data out of memory and onto the CUDA device. So, pyCUDA should start to get even faster.
But, I have a bigger issue that I am concerned about, based on previous research I’ve done in this area over the years. Specifically, GIS operations like slope and aspect have few computations per data element (i.e. a 3×3 kernel to calculate slope only has about 20 calculations to generate an answer for the center cell). GPU processing really thrives when you have lots of computations per data element.
What I would really like to do is find a raster operation with massive amounts of calculations per data element. Now, Hillshade has more computations (slope, aspect, and the hillshade itself) so that might be a better tool.
So, that is where you come in. Are there any raster-based algorithms that work on 3×3 kernels that you would like to see implemented? I’ll be honest with you, I’ve been racking my brain trying to find one that has massive amounts of computations per data element, but have yet to think of something interesting (doing 5×5, 7×7, etc. kernels is boring to me).
I need to acknowledge my undergraduate students: Alex Fuerst of Xavier, Charlie Kazer of Swarthmore College, and William Hoffman of Salisbury University. I will be doing a post on these guys in the next couple of weeks.