Fri 17 February 2006
It's amazing how many excellent open source GIS applications are out there just waiting to be discovered. I've been working with open source GIS for over 3 years now and I still find new and interesting software on a regular basis. The latest "Why haven't I heard of this before?" discovery came from the GRASS mailing list discussion on StarSpan, a tool developed at University of California at Davis "designed to bridge the raster and vector worlds of spatial analysis using fast algorithms for pixel level extraction from geometry features".
Our research project for the Ecosystem Based Management group at UCSB is in need of this exact tool in order to extract raster statistics based on a vector watersheds layer. ArcGIS and GRASS both have some of the capabilities we need through the Zonal_Statistics and v.rast.stats functions respectively. However they have their limitations and neither really handles categorical raster summaries by polygon. StarSpan looks like a more efficient option in terms of speed, scriptability and capabilities.
Installation is very smooth. It requires a recent version of GDAL (>= 1.2.6) and GEOS (>= 2.1.2). Once the dependencies are met, compilation on a unix system is as easy as configure, make, make install (There are also Windows binaries available). There is a single command line interface for all the functionality and StarSpan is able to handle all GDAL rasters and OGR vectors.
For classified rasters such as a land cover raster, we'd like to get the number of pixels for each landcover class by watershed. StarSpan creates a nice, normalized csv with three columns; The vector feature id, the raster value, and the number of pixels. There will be up to (number of features X number of classes) rows.
starspan --vector watershed.shp --raster landcover.tif --count-by-class landcover_by_watershed.csv
In order to find the percentage of a given raster class for each watershed, you can bring the csv into a relational database and do a quick SQL query. Here's an example of finding the percentage of cropland (class value is 12) for each watershed:
SELECT t.fid AS fid, (t.count::numeric / s.total::numeric) * 100 AS percentage_cropland
FROM landcover_by_watershed t,
(SELECT fid, sum(count) AS total
FROM lancover_by_watershed
GROUP BY fid) as s
WHERE t.fid = s.fid
AND t.class = 12;
Which gives us...
fid | percentage_cropland
-----+------------------------------------------------
1 | 28.571428571428571429
2 | 71.428571428571428571
3 | 36.363636363636363636
4 | 63.636363636363636364
For continuous surfaces such as elevations and slopes, we'll need to get quantitative statistics of those rasters by watershed. StarSpan can easily generate averages, mode, standard deviation, min and max:
starspan --vector watershed.shp --raster slope.tif --stats slope_stats.csv avg mode stdev min max
Which outputs a csv with one row per feature identified by feature id and each stat as a column:
FID,numPixels,avg_Band1,mode_Band1,stdev_Band1,min_Band1,max_Band1
1,25921,34.694822,38.917000,14.491952,0.347465,66.241035
2,21755,7.965552,0.000000,5.484245,0.000000,42.017155
...
While I can confirm that these small test cases work very quickly and give us pretty much the exact outputs we need, it will be interesting to see how well it stacks up to ArcGIS and GRASS when it comes to cranking out the big datasets. We'll likely try all three methods and I'll make sure to post the results.
Oh and the comparison between StarSpan and GRASS may become at moot point in the future since there is talk about integrating it with the GRASS project. While a GRASS module would be nice, not everyone has GRASS installed so I would hope the stand-alone version is still maintained since it can deal with pretty much any vector or raster data source.