Creating a Local OSM Extract

Your own little OpenStreetMap database

Overview

  • Introduction
  • OSM Data
  • Import Scenarios
  • Keeping it Up to Date
  • Using it
  • Questions

OpenStreetMap Data

What is OSM?

“The Wikipedia of Maps”

OSM is a dataset, which is…

  • global
  • open
  • used to map nearly anything
  • one of my favorite things

Is anyone actually using it?

  • Amazon Logistics
  • Tesla Smart Summon feature
  • Esri (basemaps, feature services)
  • Pokemon Go
  • Red Cross
  • Kendall County!

OpenStreetMap Data:
It’s weird!

The OSM dataset has:

  • NO layers
  • NO schema
  • NO polygons

The Structure

OSM data is made up of:

  1. Nodes
  2. Ways (ordered series of nodes)
  3. Relations (collection of other elements)

Let’s look at an example!

Looking at it: Nodes

{
  "elements": [
    {
      "type": "node",
      "id": 6650982353,
      "lat": 41.6620246,
      "lon": -88.5328721,
      "tags": {
        "direction": "forward",
        "highway": "stop"
      }
    },
    {
      "type": "node",
      "id": 235170089,
      "lat": 41.6630576,
      "lon": -88.5327984,
      "tags": {
        "railway": "crossing",
        "crossing:bell": "yes"
      }
    },
    {
      "type": "node",
      "id": 235170094,
      "lat": 41.6637334,
      "lon": -88.5327199
    }
  ]
}

Looking at it: Ways

{
  "elements": [
    {
      "type": "way",
      "id": 21814608,
      "nodes": [
        6650982353,
        235170089,
        235170094
      ],
      "tags": {
        "highway": "tertiary",
        "name": "Lew Street"
      }
    },
    {
      "type": "node",
      "id": 6650982353,
      "lat": 41.6620246,
      "lon": -88.5328721,
      "tags": {
        "direction": "forward",
        "highway": "stop"
      }
    },
    {
      "type": "node",
      "id": 235170089,
      "lat": 41.6630576,
      "lon": -88.5327984,
      "tags": {
        "railway": "crossing",
        "crossing:bell": "yes"
      }
    },
    {
      "type": "node",
      "id": 235170094,
      "lat": 41.6637334,
      "lon": -88.5327199
    }
  ]
}

Looking at it: Areas

{
  "elements": [
    {
      "type": "way",
      "id": 21814608,
      "nodes": [
        6650982353,
        235170089,
        235170094,
        235170095,
        6650982353
      ],
      "tags": {
        "building": "office",
        "name": "City Hall"
      }
    }
  ]
}

Looking at it: Areas?

{
  "elements": [
    {
      "type": "way",
      "id": 21814608,
      "nodes": [
        6650982353,
        235170089,
        235170094,
        235170095,
        6650982353
      ],
      "tags": {
        "barrier": "wall"
      }
    }
  ]
}

Looking at it: Relations

{
  "elements": [
    {
      "type": "relation",
      "id": 10004268,
      "members": [
        {
          "type": "way",
          "ref": 21814608,
          "role": ""
        }
      ],
      "tags": {
        "name": "Plano City Road 1",
        "network": "US:IL:Plano",
        "type": "route",
        "route": "road"
      }
    },
    {
      "type": "way",
      "id": 21814608,
      "nodes": [
        6650982353,
        235170089,
        235170094,
        235170095,
        6650982353
      ],
      "tags": {
        "barrier": "wall"
      }
    }
  ]
}

Looking at it: Relations

{
  "elements": [
    {
      "type": "relation",
      "id": 10004268,
      "members": [
        {
          "type": "way",
          "ref": 21814608,
          "role": "outer"
        },
        {
          "type": "way",
          "ref": 21814722,
          "role": "inner"
        },
        {
          "type": "way",
          "ref": 21815110,
          "role": "inner"
        },
        {
          "type": "way",
          "ref": 21814600,
          "role": "inner"
        }
      ],
      "tags": {
        "building": "yes"
      }
    }
  ]
}

So how does this…

…turn into this:

Getting the Data

Sources

mkdir -p data/cache
cd data
wget -N https://download.geofabrik.de/north-america/us/illinois-latest.osm.pbf

Prep the Data

Osmium

A “swiss army knife” for OSM data.

  • Extracting geographic subset
  • Getting metadata about objects
  • Extract by attribute
  • Generate / apply change files
  • and more!

Audience
Participation!

geojson.io

Extracting with Osmium

osmium extract \
  -p /app/city.geojson \
  /app/data/illinois-latest.osm.pbf \
  -o /app/data/city-extract.osm.pbf --overwrite \
  -s smart -S types=any

Importing with imposm

Pros / Cons

  • Easy to use and configure
  • Built in tag “cleaning”
  • Table generalization
  • AOI filtering on import and update
  • Relations are harder to work with
  • No advanced geometry processing or configuration
  • AOI does not apply to relations
  • Development seems to have plateaued

Table Config

Mapping OSM data to tables is done using a YAML config file.

highways-mapping.yml
tables:
  roads:
    columns:
    - name: osm_id
      type: id
    - name: the_geom
      type: geometry
    - name: type
      type: mapping_value
    - key: name
      name: name
      type: string
    filters:
      reject:
        area: ["yes"]
    mapping:
      highway:
      - motorway
      - trunk
      - primary
      - secondary
      - tertiary
      - residential
    type: linestring

Importing: Single Table

imposm import \
  -mapping /app/imposm-scenarios/highways-mapping.yml \
  -read /app/data/city-extract.osm.pbf \
  -cachedir /app/data/cache/highways \
  -write -connection postgis://gis:gis@database:5432/osm?prefix=state_highways \
  -deployproduction \
  -optimize \
  -overwritecache

Importing: Single Table

imposm import \
  -config /app/imposm-scenarios/highways-config.json \
  -read /app/data/city-extract.osm.pbf \
  -write \
  -deployproduction \
  -optimize \
  -overwritecache


highways-config.json
{
    "cachedir": "/app/data/cache/highways",
    "mapping": "/app/imposm-scenarios/highways-mapping.yml",
    "connection": "postgis://gis:gis@database:5432/osm?prefix=state_highways"
}

Importing: Multiple Tables

imposm import \
  -config /app/imposm-scenarios/city-parks-config.json \
  -read /app/data/city-extract.osm.pbf \
  -deployproduction -optimize -write -overwritecache

Importing: EVERYTHING

And prepping for future updates!

imposm import \
  -config /app/imposm-scenarios/city-all-config.json \
  -read /app/data/city-extract.osm.pbf \
  -deployproduction -optimize -write -overwritecache \
  -diff


city-all-config.json
{
    "cachedir": "/app/data/cache/city-all",
    "mapping": "/app/imposm-scenarios/city-all-mapping.yml",
    "connection": "postgis://gis:gis@database:5432/osm?prefix=city_all",
    "diffdir": "/app/data/diff"
}

Everything?

city-all-mapping.yml
tags:
  load_all: true
  exclude: [created_by, source, "tiger:*"]
tables:
  nodes:
    mapping:
      __any__: [__any__]
    columns:
    - name: osm_id
      type: id
    - name: the_geom
      type: geometry
    - name: tags
      type: hstore_tags
    type: point

Importing With osm2pgsql

Pros / Cons

  • Well established, continued development
  • Extremely configurable
  • Allows geometry processing “mid-stream”
  • Exports to many coordinate systems
  • Decent default config
  • Lots of examples
  • Updating requires a replication URL, not easily limited to AOI
  • Custom configuration can be harder to adjust / understand
  • Built-in generalization is experimental

Importing: Defaults

osm2pgsql /app/data/city-extract.osm.pbf \
  -d postgres://gis:gis@database:5432/osm

Take the plunge!

The osm2pgsql maintainers strongly encourage using the flex output, especially if you’re just starting out using it. The default output does not always take advantage of new capabilities.

Flex Output

Mapping to tables is done with a Lua file.

pois.lua
local pois = osm2pgsql.define_table({
    name = 'pois',
    ids = { type = 'any', type_column = 'osm_type', id_column = 'osm_id' },
    columns = {
        { column = 'name' },
        { column = 'class', not_null = true },
        { column = 'subclass' },
        { column = 'tags', type = 'jsonb'},
        { column = 'geom', type = 'point', not_null = true },
  }
})

function process_poi(object, geom)
    local a = {
        name = object.tags.name,
        geom = geom,
        tags = object.tags
    }

    if object.tags.amenity then
        a.class = 'amenity'
        a.subclass = object.tags.amenity
    elseif object.tags.shop then
        a.class = 'shop'
        a.subclass = object.tags.shop
    else
        return
    end

    pois:insert(a)
end

function osm2pgsql.process_node(object)
    process_poi(object, object:as_point())
end

function osm2pgsql.process_way(object)
    if object.is_closed then
        process_poi(object, object:as_polygon():centroid())
    end
end

Importing: POIs

osm2pgsql \
  -j /app/data/city-extract.osm.pbf \
  -d postgres://gis:gis@database:5432/osm \
  -O flex -S /app/osm2pgsql-scenarios/pois.lua

Importing: Road Routes

routes.lua
local ways = osm2pgsql.define_table({
    name = 'routes',
    ids = { type = 'way', id_column = 'osm_id' },
    columns = {
        { column = 'id', sql_type = 'serial', create_only = true },
        { column = 'ref' },
        { column = 'name' },
        { column = 'network' },
        { column = 'tags', type = 'jsonb' },
        { column = 'the_geom', type = 'multilinestring', not_null = true}
    }
})

function osm2pgsql.process_relation(object)
    if object.tags.route == 'road' then
        ways:insert({
            tags = object.tags,
            the_geom = object:as_multilinestring():line_merge(),
            ref = object.tags.ref,
            name = object.tags.name,
            network = object.tags.network
        })
    end
end

osm2pgsql \
  -j /app/data/city-extract.osm.pbf \
  -d postgres://gis:gis@database:5432/osm \
  -O flex -S /app/osm2pgsql-scenarios/routes.lua

Importing: Everything

osm2pgsql \
  -j /app/data/city-extract.osm.pbf \
  -d postgres://gis:gis@database:5432/osm \
  -O flex -S /app/osm2pgsql-scenarios/city-all.lua

All nodes, ways, and some relations1 are dumped into tables with all their tags.

“Everything” Tables

Why Would I Want This?

  • Schema remains as flexible as OSM itself
  • As use cases change, only the queries need to be modified, not the data
  • No re-importing / re-indexing
  • Query output can be used identically to any SQL table using >> data accessor
  • Can use queries to populate Materialized Views in postgres

Updating

The Easy Way

Just re-run the import steps whenever a new extract is available from the original source!

For GeoFabrik, this means your data is only a day or so behind the main OSM dataset, depending on:

  1. When the extract was “cut”
  2. When you ran your import

The Automatic Way

  • Throw all your commands into a script
  • Add in a wget or curl command to download extract file
  • Schedule the script to run every 24 hours

Really up to date: imposm

As long as you included -diff in the original import command, updating with imposm simple:

Use imposm run along with the same properties or config file as before.

imposm run -config /app/imposm-scenarios/city-all-config.json

Really up to date: osm2pgsql

Updating with osm2pgsql needs a replication URL. It works great with the diffs at planet.osm.org.

Issues

  • That URL is for global changes
  • Replication URLs for Geofabrik exist, but they are daily, not minutely
  • Setting up a custom replication URL is not a trivial thing

To limit the changes to your area of interest is possible, but requires some custom coding to work.

If “live” data is needed, probably just use imposm.

Using It

jsonb columns

Using the column ->> 'key' syntax returns the value for the specified key.

SELECT
  osm_id,
  the_geom,
  tags ->> 'cuisine' as cuisine,
  tags ->> 'name' as name
FROM osm2pgsql_nodes
WHERE tags ->> 'amenity' = 'restaurant'

Sample Queries

traffic_roads.sql
SELECT
  osm_id,
  the_geom,
  tags -> 'name' name,
  tags -> 'highway' class
FROM osm2pgsql_ways
WHERE tags -> 'highway' IN (
  'primary',
  'secondary',
  'tertiary',
  'motorway',
  'trunk',
  'unclassified',
  'residential',
  'service'
)

SELECT
    osm_id,
    the_geom,
    COALESCE(tags -> 'natural', tags->'landuse') type,
    tags
FROM osm2pgsql_areas
WHERE COALESCE(tags -> 'natural', tags->'landuse') IS NOT NULL

cycling_infrastructure.sql
select
    osm_id,
    the_geom,
    tags -> 'name' name,
    case
        when tags -> 'highway' in ('cycleway', 'footway', 'path') then 'separated'
        else 'on road'
    end class,
    tags -> 'highway' type,
    tags -> 'bicycle' bicycle_access
from osm2pgsql_ways
where tags -> 'bicycle' != 'no'
or tags -> 'cycleway' = 'lane'

charging_stations.sql
select
  osm_id,
  the_geom,
  tags -> 'access' access
from osm2pgsql_nodes
where tags -> 'amenity' = 'charging_station'

What to Do

Honestly, once you write the query, you can use the results for anything that you would a normal table, including:

  • Publishing a feature service to the web
  • Generating vector tiles
  • Using as input in geoprocessing
  • Create an ArcGIS Locator dataset
  • Create a routable graph network

It was noted earlier, but if you’re going to be providing map layers, services, etc., directly from this database, consider writing your query to a Materialized View for a performance boost.

Just don’t forget to refresh them now and then.

Questions?

Be in Touch!