SQL and ogr to create 4D lines

A friend recently asked me to help him generate polyline shapefiles with Z and M values that he could deliver to a customer.  The problem was, the software he was using supported the import of Z and M values, but did not support the export of those files.  The other problem was, he has zillions of data tables that he needed to export!

Fortunately, PostGIS allows us to create Polyline ZM data rather easily.  The next part was to figure out how to get it exported to a shapefile.  So, here goes:

Assume you have a table with 4 columns: lineID, Z, M, and geometry.  The geometry field is a point feature that represent vertices on a line.  The lineID separates all those points by which line they are part of, and the Z and M values are, well, Z and M.

Make the Polyline ZM

The SQL command to create a 4D line is:

SELECT  lineid, 
ST_MakeLine(ST_MakePoint(ST_X(geometry), 
                         ST_Y(geometry), z, m)
            ) AS g
FROM zpts
GROUP BY lineid;

In this case, I am making a line (ST_MakeLine) as a series of X (ST_X), Y (ST_Y), Z, and M values.  Grouping the results of the query by the lineid allows me to create a line for each set of points. Continue reading

When More is Less…. lessons from processing large data files

My good friend Stuart Hamilton gave me a fun conundrum to try out. He has a file of province boundaries (400 areas) and lidar derived mangrove locations (37 million points – 2.2GB in size). He wants to find the number of mangroves that are contained in each area.  He also want to know which country a mangrove location is in.  An overview of the area is here:

allstu

but, as you zoom in, you can see that there are a tremendous number of points:

stuzoom

The problem

You would think that overlaying 37 million points with 400 polygons wouldn’t be too much trouble – but, it was.  Big time trouble.  In fact, after running for days in ArcGIS, Manifold GIS, PostGRES/PostGIS, and spatial Hadoop, it simply would not complete. Continue reading

New Books – How do I do that in PostGIS, How do I do that in Manifold SQL

I have two new books out – How do I do that in PostGISand How do I do that in Manifold SQL.  

From the back cover of How do I do that in PostGIS:

For those who are unsure if SQL is a sufficient language for performing GIS tasks, this book is for you. This guide follows the topic headings from the book How do I do that in ArcGIS/Manifold, as a way to illustrate the capabilities of the PostGIS SQL engine for accomplishing classic GIS tasks. With this book as a resource, users will be able to perform many classic GIS functions using nothing but SQL.

Continue reading

A Poor Man’s Parallel Processor for GIS

In addition to SQL, I also am interested in processing large volumes of spatial data.  One of the newest rages in “big data” is Hadoop.  According to Wikipedia:

Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.

One way this is implemented is a programming model called MapReduce.  Don’t get too excited, it doesn’t have anything to do with maps or GIS – but, it is very clever and powerful for certain types of problems.  The concept is if you have a really large dataset, you divide and conquer that dataset in a number of steps.  For example, say we wanted to know all the people with the name “John” in the phonebook, and say we had 26 computers in a cluster – we might solve this by:

1.  Use each computer (1-26) to find all the “Johns” for the first letter in the last name (A-Z).  That way, you have effectively broken the problem into 26 smaller units.

2.  Once each computer has counted up the number of Johns, you have reduced the dataset (hence, MapReduce) to 26 variables.

3.  Now, count up the total of the 26 variables.

That is an oversimplified version of course, but it helps to illustrate what we want to do.  I understand that the University of Minnesota has created a set of functions called SpatialHadoop.  I want to test this over the summer, but for now I decided to create my own poor man’s version using PostGRES. Continue reading