Sun 11 January 2015
The case for topology
Simple feature representations
of polygon geometries are ubiquitous due to their ease of use.
Thinking of spatial features as having a single, independent geometry is easy and fits most use cases.
But that ease of use disappears when we need to represent the topological
relationship between features.
In this article, I'll focus on one particular task with simple features data that
would benefit from topology - namely simplifying a polygon dataset by removing vertices.
Here's the original dataset, a 30+MB shapefile with very dense line work.
Geometries can be simplified under the Simple Features model but, since
each geometry is processed independently, the topological relationships between
features can be disrupted. For instance, using the Simplify Geometries
tool in
QGIS, I can simplify the polygons dramatically but we see gaps between polygons
and other side effects.
The plan
Because we'll need to build topology before acting on it, the process for simplifying
simple features datasets involves converting the data to topological structure,
simplifying it, then converting it back to a simple features representation.
Many of the big GIS systems (ESRI's .e00, ArcInfo "coverages", and GRASS vectors)
have their own topological data structures. More recently, we've seen the rise of
Open Street Map (OSM) format and TopoJSON, both of which model topological relationships.
Of these options, I selected TopoJSON
because of it's robust command-line tool
which handles building topology and simplification in one step. Additionally, it
works with GeoJSON and Shapefile inputs, two of the most common
data formats for simple features.
The workflow goes something like this:
- Convert data into a shapefile with the EPSG:4326 spatial reference (lonlat, wgs84)
- Convert to topojson and simplify
- Convert to geojson
- Optionally, convert geojson to other formats supported by OGR
To follow along, you'll need to have the following software installed:
- GDAL command line utilities (we'll use
ogr2ogr
at the command line)
- The
topojson
command line utility
- Python with the
shapely
package installed.
Step 1: Convert to WGS84 shapefile
If you're already working with an ESRI Shapefile or GeoJSON format and your data
is already in unprojected WGS84 coordinates (i.e. EPSG:4326), you can skip to step 2.
Otherwise, ogr2ogr
makes that conversion simple:
ogr2ogr -t_srs epsg:4326 -f "ESRI Shapefile" \
ecoregions_original.shp EcoregionSummaries3.gdb.zip EcoRegions
Step 2: Convert to TopoJSON and simplify
The simplification, quantization (more on that later) and the conversion to
a topological data model are handled by topojson
You have two options for specifying how aggressively you want to simplify your data.
- Use a tolerance, specified in steridians with the
-s
flag
- Use a proportion of points, 0 to 1, to retain with the
--simplify-proportion
flag
One quirk of the topojson implementation is that it uses a relatively low quantization factor by default.
Effectively, this snaps coordinates to a grid in order to save space and simplify geometries even further.
This yields nice small coordinates but can result in a "stair step" effect at higher
zoom levels. The default is -q 1E4
but I've found good results with -q 1E6
as
recommended in the topojson docs.
As an example, let's take our ecoregions_original.shp
and convert it to topojson
with a tolerance of 1E-8
steridians. We want to make sure we explicitly mention
that the data is in spherical (unprojected) coordinates and to retain the properties
of the original attribute table:
topojson --spherical \
--properties \
-s 1E-8 \
-q 1E6 \
-o temp.topojson \
ecoregions_original.shp
Step 3: Convert to GeoJSON
This part was a bit trickier than I anticipated. Luckily Sean Gillies has written
some preliminary python functions
for converting topojson geometries to standard GeoJSON-like python dictionaries.
In order to make a higher-level conversion utility, I started working on topo2geojson.py which provides a command line
interface to perform TopoJSON to GeoJSON conversions.
python topo2geojson.py temp.topojson ecoregions_simple.geojson
There is some additional logic to ensure validity of polygons though it is very
basic and I'm sure there are ways to make the geometry conversions more robust.
Please note that I've only tested this script on this one dataset and it likely needs
additional work to be considered a full-fledged conversion tool; consider it more of a
starting point than an out-of-the box solution.
Optional Step 4: Convert to any OGR format
Once data is in GeoJSON format, we're free to do what we want with it, including
converting it back to a shapefile or any other OGR supported data format.
ogr2ogr -f "ESRI Shapefile" ecoregions_simple.shp ecoregions_simple.geojson OGRGeoJson
Case study: evaluating simplification tolerances
In the remainder of this article, I'll walk through a demonstration of these steps
in order to find an optimal simplification tolerance for my test data. The optimal
tolerance depends on your needs, what scales you will be using your data and how
aggressively you need to reduce file size. Ultimately, it's a ** tradeoff between
low geometry size and accurate line work**.
We can easily script this solution in order to test multiple simplification tolerances.
As a bonus, we can fire off multiple iterations at once to leverage multiple cores.
Since I've got 4 cores on my laptop, I can run 4 processes in nearly the same time
it takes to run 1 using some simple shell tricks (Linux/OSX only; sorry Windows users but I don't know .bat files well enough to demonstrate)
for tolerance in 1E-7 1E-8 1E-9 1E-10
do
topojson --spherical \
--properties \
-s $tolerance \
-q 1E6 \
-o temp_$tolerance.topojson \
ecoregions_original.shp &&
# Convert it to GeoJSON
python topo2geojson.py temp_$tolerance.topojson temp_$tolerance.geojson &&
# Optionally, convert GeoJSON to any OGR data source
ogr2ogr -f "ESRI Shapefile" ecoregions_$tolerance.shp temp_$tolerance.geojson OGRGeoJson &
done
wait
Then we can take a look at the resulting .topojson file sizes
$ ls -lh *.topojson
-rw-rw-r-- 1 mperry mperry 4.5M Jan 11 12:25 temp_1E-10.topojson
-rw-rw-r-- 1 mperry mperry 2.1M Jan 11 12:25 temp_1E-9.topojson
-rw-rw-r-- 1 mperry mperry 869K Jan 11 12:25 temp_1E-8.topojson
-rw-rw-r-- 1 mperry mperry 362K Jan 11 12:25 temp_1E-7.topojson
OK, so with a simplification tolerance of 1E-10 steridians, we can get a 4.5M file.
If we reduce it to 1E-7, we can get 362k file - a 12.5x reduction. Is the reduction
in file size worth the reduction in geometric accuracy? The only way to find out is to
render maps of the resulting datasets and visually assess them.
First thing that we notice - all of the results have retained topology with no gaps or slivers introduced.
(the key benefit to this workflow).
Next, we notice that at this scale (roughly 1:500k on my monitor) we can barely
see a difference between the 1E-9 version and the original. And the 1E-7 version
looks a bit too simplified and chunky. So, in this case, we can say that a simplification
tolerance of around 1E-8 steridians is an optimal balance of file size and detail.
Of course other datasets, scales and uses may have completely different results so please try
it out and let me know how it goes. Just don't settle for simple features simplification next time you need to
reduce file sizes!