perrygeo

About | Articles | CV

Topological simplification of simple features

Sun 11 January 2015

The case for topology

Simple feature representations of polygon geometries are ubiquitous due to their ease of use. Thinking of spatial features as having a single, independent geometry is easy and fits most use cases. But that ease of use disappears when we need to represent the topological relationship between features.

In this article, I'll focus on one particular task with simple features data that would benefit from topology - namely simplifying a polygon dataset by removing vertices. Here's the original dataset, a 30+MB shapefile with very dense line work.

original

Geometries can be simplified under the Simple Features model but, since each geometry is processed independently, the topological relationships between features can be disrupted. For instance, using the Simplify Geometries tool in QGIS, I can simplify the polygons dramatically but we see gaps between polygons and other side effects.

no_topology

The plan

Because we'll need to build topology before acting on it, the process for simplifying simple features datasets involves converting the data to topological structure, simplifying it, then converting it back to a simple features representation.

Many of the big GIS systems (ESRI's .e00, ArcInfo "coverages", and GRASS vectors) have their own topological data structures. More recently, we've seen the rise of Open Street Map (OSM) format and TopoJSON, both of which model topological relationships.

Of these options, I selected TopoJSON because of it's robust command-line tool which handles building topology and simplification in one step. Additionally, it works with GeoJSON and Shapefile inputs, two of the most common data formats for simple features.

The workflow goes something like this:

Convert data into a shapefile with the EPSG:4326 spatial reference (lonlat, wgs84)
Convert to topojson and simplify
Convert to geojson
Optionally, convert geojson to other formats supported by OGR

To follow along, you'll need to have the following software installed:

GDAL command line utilities (we'll use ogr2ogr at the command line)
- apt-get install gdal-bin
The topojson command line utility
- npm install -g topojson
Python with the shapely package installed.
- pip install shapely

Step 1: Convert to WGS84 shapefile

If you're already working with an ESRI Shapefile or GeoJSON format and your data is already in unprojected WGS84 coordinates (i.e. EPSG:4326), you can skip to step 2.

Otherwise, ogr2ogr makes that conversion simple:

ogr2ogr -t_srs epsg:4326 -f "ESRI Shapefile" \
   ecoregions_original.shp EcoregionSummaries3.gdb.zip EcoRegions

Step 2: Convert to TopoJSON and simplify

The simplification, quantization (more on that later) and the conversion to a topological data model are handled by topojson

You have two options for specifying how aggressively you want to simplify your data.

Use a tolerance, specified in steridians with the -s flag
Use a proportion of points, 0 to 1, to retain with the --simplify-proportion flag

One quirk of the topojson implementation is that it uses a relatively low quantization factor by default. Effectively, this snaps coordinates to a grid in order to save space and simplify geometries even further. This yields nice small coordinates but can result in a "stair step" effect at higher zoom levels. The default is -q 1E4 but I've found good results with -q 1E6 as recommended in the topojson docs.

As an example, let's take our ecoregions_original.shp and convert it to topojson with a tolerance of 1E-8 steridians. We want to make sure we explicitly mention that the data is in spherical (unprojected) coordinates and to retain the properties of the original attribute table:

topojson --spherical \
        --properties \
        -s 1E-8 \
        -q 1E6 \
        -o temp.topojson \
        ecoregions_original.shp

Step 3: Convert to GeoJSON

This part was a bit trickier than I anticipated. Luckily Sean Gillies has written some preliminary python functions for converting topojson geometries to standard GeoJSON-like python dictionaries.

In order to make a higher-level conversion utility, I started working on topo2geojson.py which provides a command line interface to perform TopoJSON to GeoJSON conversions.

python topo2geojson.py temp.topojson ecoregions_simple.geojson

There is some additional logic to ensure validity of polygons though it is very basic and I'm sure there are ways to make the geometry conversions more robust. Please note that I've only tested this script on this one dataset and it likely needs additional work to be considered a full-fledged conversion tool; consider it more of a starting point than an out-of-the box solution.

Optional Step 4: Convert to any OGR format

Once data is in GeoJSON format, we're free to do what we want with it, including converting it back to a shapefile or any other OGR supported data format.

ogr2ogr -f "ESRI Shapefile" ecoregions_simple.shp ecoregions_simple.geojson OGRGeoJson

Case study: evaluating simplification tolerances

In the remainder of this article, I'll walk through a demonstration of these steps in order to find an optimal simplification tolerance for my test data. The optimal tolerance depends on your needs, what scales you will be using your data and how aggressively you need to reduce file size. Ultimately, it's a ** tradeoff between low geometry size and accurate line work**.

We can easily script this solution in order to test multiple simplification tolerances. As a bonus, we can fire off multiple iterations at once to leverage multiple cores. Since I've got 4 cores on my laptop, I can run 4 processes in nearly the same time it takes to run 1 using some simple shell tricks (Linux/OSX only; sorry Windows users but I don't know .bat files well enough to demonstrate)

for tolerance in 1E-7 1E-8 1E-9 1E-10
    do
        topojson --spherical \
            --properties \
            -s $tolerance \
            -q 1E6 \
            -o temp_$tolerance.topojson \
            ecoregions_original.shp &&

        # Convert it to GeoJSON
        python topo2geojson.py temp_$tolerance.topojson temp_$tolerance.geojson &&

        # Optionally, convert GeoJSON to any OGR data source
        ogr2ogr -f "ESRI Shapefile" ecoregions_$tolerance.shp temp_$tolerance.geojson OGRGeoJson &

    done
    wait

Then we can take a look at the resulting .topojson file sizes

$ ls -lh *.topojson
-rw-rw-r-- 1 mperry mperry 4.5M Jan 11 12:25 temp_1E-10.topojson
-rw-rw-r-- 1 mperry mperry 2.1M Jan 11 12:25 temp_1E-9.topojson
-rw-rw-r-- 1 mperry mperry 869K Jan 11 12:25 temp_1E-8.topojson
-rw-rw-r-- 1 mperry mperry 362K Jan 11 12:25 temp_1E-7.topojson

OK, so with a simplification tolerance of 1E-10 steridians, we can get a 4.5M file. If we reduce it to 1E-7, we can get 362k file - a 12.5x reduction. Is the reduction in file size worth the reduction in geometric accuracy? The only way to find out is to render maps of the resulting datasets and visually assess them.

	Original	1E-7	1E-8	1E-9

First thing that we notice - all of the results have retained topology with no gaps or slivers introduced. (the key benefit to this workflow).

Next, we notice that at this scale (roughly 1:500k on my monitor) we can barely see a difference between the 1E-9 version and the original. And the 1E-7 version looks a bit too simplified and chunky. So, in this case, we can say that a simplification tolerance of around 1E-8 steridians is an optimal balance of file size and detail.

Of course other datasets, scales and uses may have completely different results so please try it out and let me know how it goes. Just don't settle for simple features simplification next time you need to reduce file sizes!