Your Very Own OSM!

Creating a Local OSM Extract

openstreetmap

workshop

postgis

A step-by-step guide to using osm2pgsql and osmium to create a regional extract of OpenStreetMap

Overview

This guide was written to accompany an in-person workshop which includes the related OSM extract presentation. Users completing this guide are assumed to have basic familiarity with OpenStreetMap data.

Goals

By the end of this guide, you should have a database with a few OpenStreetMap layers in it for the area you’re intersted in.

More than that, you should have a basic understanding of how the component parts of the process work, and how to extend this to meet your own particular needs or interests.

Associated Repo

Much of the information, configuration files, and commands in this workshop can be found in the associated jcarlson/osm-extract-workshop repo on Codeberg. You won’t necessarily need it, but you may find it useful for quickly copying Lua files and terminal commands.

Requirements

Data

A Geofabrik extract of your choice, in the *.osm.pbf format
A geoJSON file representing your area of interest

geojson.io is great for getting a quick geoJSON

Software

Optional

QGIS
pgAdmin

QGIS isn’t really required to complete the workshop, but it’s not going to be much fun if you can’t see and work with your data.

Another option is to connect to your database with pgAdmin, where you can at least query the tables and preview the spatial component. If you’re using the workshop container (see next section), pgAdmin is part of the compose file. Sign in with the email gis@example.com, password gis.

Note

Previously, this workshop material covered the use of imposm. I decided to cut that from the workshop in 2024 in order to focus more on osm2pgsql, which I consider more robust and adaptable. I’d rather the workshop be a deeper dive on one tool than a shallow introduction to both.

Software Alternatives: Docker / Podman

Maybe you can’t / don’t want to install all that on your machine. No worries! Using a container system like Docker or Podman, you can run everything in its own space, and you can easily trash it all when you’re done.

Whether you use Docker or Podman or something else entirely is up to you. This workshop has been tested in the two named options and works well in both. We’ll refer to “Docker” for convenience, but if you’re on Podman, more power to ya.

Run Your Own Containers

A simple option here is to use a pre-built Docker image with osm2pgsql and osmium already set up for you. If you want access to the latest versions of things, there is a Dockerfile in the workshop repo. Its built counterpart is on Docker Hub.

Getting the image is as simple as:

docker pull jdcarls2/osm-extract-workshop:foss4gna2024

You can also use a container to create your PostgreSQL / PostGIS database, regardless of where the other tools are installed. This is especially useful for trying out the process without affecting the rest of your system.

Isaac Boates’ Docker Images

Credit where it’s due: the image I’ve put on Docker Hub is adapted from Isaac Boates’ incredibly useful osm-utilities-docker repo.

The key difference with those Docker images is that the containers use osm2pgsql and osmium as their entrypoints. That means that running the container is the same as running those commands.

It’s a convenient feature, but in practice, it means that the terminal commands you’ll use to interact with them are specific to that image. For the workshop, I wanted to show commands that would work the same in or out of Docker. Rather than running a command and exiting, the the workshop image will stay on and wait for user input.

docker run iboates/osm2pgsql:latest [args] == osm2pgsql [args]

Compose File

If you want to get your environment running with the right OSM tools as well a PostGIS database all in one go, consider using a compose file. Here’s a simple example:

docker-compose.yml

version: '3'

services:
  database:
    image: 'postgis/postgis:latest'
    ports:
      - 55432:5432
    environment:
      POSTGRES_USER: gis
      POSTGRES_PASSWORD: gis
      POSTGRES_DB: osm
    networks:
      - osm  

  osm:
    image: 'jdcarls2/osm-extract-workshop:foss4gna2024'
    volumes:
      - ./app:/app
    tty: true
    networks:
      - osm

volumes:
  db-data:

networks:
  osm:
    driver: bridge

1: Maps the default postgres port (5432) to our machine’s port 55432. This makes our DB available outside of the Docker image.
2: Anything in the local app directory will be available in the osm-tools image in a folder of the same name. This is how we can easily pass data files in and out.
3: Keeps the image alive so that we can access an interactive terminal

Sending Commands to your Docker Image

GUI

In Docker Desktop, it’s quite easy. Click on the osm container, then switch to the terminal tab.

CLI

Once your container is running, all you need is the command docker attach [container-name]. Your terminal will be “attached” to that instance, letting you send terminal commands to it.

Create a geographic subset: `osmium`

Osmium is a veritable Swiss Army knife for working with OpenStreetMap data. It does all sorts of things, like:

Osmium tool manual

extracting geographic subsets
getting metadata about objects
extracting data by attributes
generating and applying change files

For our purposes, it’s the geographic subset we’re interested in.

osmium extract \
  -p city.geojson \
  /app/data/illinois-latest.osm.pbf \
  -o /app/data/city-extract.osm.pbf --overwrite \
  -s smart -S types=any

1: Our area of intest
2: The Geofabrik file
3: The new output
4: The extract strategy for features that cross our AOI boundary.

You might not need this if you’re working with the entire Geofabrik extract, like an entire state or region. But if you’re working with a geographic subset, trimming it down with osmium ahead of our import steps will speed things up a lot.

After running the above command, you ought to have a smaller .osm.pbf file created.

Your Turn!

Extract a subset of your data file. Compare file sizes - the extract should be smaller than the original file.

Optionally, run osmium fileinfo your-extract-file.osm.pbf to see information about it. Compare with the output of the same command against the original file.

Your AOI spans two (or more) files!

Suppose you’re trying to create a geographic subset for an area that spans multiple Geofabrik files. It would be pretty inefficient to download the entire region or country just to get a little bit of the data over that line.

This is particularly relevant, given that this workshop is happening in St. Louis, a metro area that straddles a state line.

So, what do we do?

Just download both extracts and use osmium extract on both of them, keeping their names distinct. Then use osmium merge!

osmium extract -p city.geojson /app/data/illinois-latest.osm.pbf \
  -o /app/data/1.osm.pbf --overwrite \
  -s smart -S types=any

osmium extract -p city.geojson /app/data/missouri-latest.osm.pbf \
  -o /app/data/2.osm.pbf --overwrite \
  -s smart -S types=any

osmium merge /app/data/1.osm.pbf /app/data/2.osm.pbf \
  -o /app/data/composite.osm.pbf --overwrite

Importing the data: `osm2pgsql`

Osm2pgsql is an incredible tool for getting your OSM data into a relational database that other programs can work with. There’s more to it than we can cover in a single workshop, but we’ll cover the essentials to get you started.

Pros

well established and actively maintained
extremely configurable
allows “mid-stream” geometry processing
can export to different coordinate systems (with proj support)
lots of examples in the documentation

Cons

Updating requires a replication URL, not easily limited to your AOI
Steep learning curve for custom configuration

Default Output

To get the most common features and fields into your database, you can run the following command:

This is assuming you used the same compose file I did. If you’re working with your own database elsewhere, obviously you’ll need to adjust the connection string.

osm2pgsql /app/data/city-extract.osm.pbf \
-d postgres://gis:gis@database:5432/osm

Your Turn!

Run osm2pgsql against your extract with the default settings.
Check out the default output in QGIS or pgAdmin. You should have four new tables:
- planet_osm_line
- planet_osm_point
- planet_osm_polygon
- planet_osm_roads

You could work with that output, and even make some decent maps with it, but you’ll probably get tired of the proliferation of empty columns and filtering required to use it well.

The osm2pgsql maintainers strongly encourage taking the plunge and learning the flex output. So let’s do that!

Flex Output

This option uses a Lua file to define what features are imported, what fields / tags get included, and more. It’s a lot more involved! Let’s take a look at some examples.

There are lots of examples on the osm2pgsql GitHub page

Example: Traffic Roads

We’ll start with a simple, focused example: traffic roads. If you’re not familiar with OpenStreetMap tagging, that will be any way where the highway tag has any of the following:

motorway
trunk
primary
secondary
tertiary
unclassified
residential
service

The table

To begin, we need to define the table object. There are specific functions for different geometry types, but I like the generic define_table.

local roads = osm2pgsql.define_table({
  name = 'roads',
  ids = { type = 'any', type_column = 'osm_type', id_column = 'osm_id' },
  columns = {
    { column = 'name' },
    { column = 'class' },
    { column = 'maxspeed' },
    { column = 'oneway' },
    { column = 'geom', type = 'point', not_null = true }
  }
})

Column attributes

When you define your columns, there are a number of properties you can define.

If a tag tends to have numeric data, you can try setting type and sql_type accordingly, to coerce the values to that type.
You can set not_null to require features to have a value in the field.
On geometry, you can set the geometry type and projection.

Processing

Next, we process the objects in our data file. We call osm2pgsql.process_way(object), and anything we put inside this function will be used against everything in our input data.

There’s also process_node and process_relation for those types of elements.

Within that function the object has a lot of fields and functions. Refer to the docs for the full list.

For us, all we really need is object.tags to look into a feature’s tags.

local highway_types = {
  'motorway',
  'motorway_link',
  'trunk',
  'trunk_link',
  'primary',
  'primary_link',
  'secondary',
  'secondary_link',
  'tertiary',
  'tertiary_link',
  'unclassified',
  'residential',
  'track',
  'service',
}

local types = {}
for _, k in ipairs(highway_types) do
  types[k] = 1
end

function osm2pgsql.process_way(object)
  if not types[object.tags.highway] then
    return
  end

  roads.insert({
    name = object.tags.name,
    class = object.tags.highway,
    maxspeed = object.tags.maxspeed,
    oneway = object.tags.oneway,
    geom = object:as_linestring()
  })
end

1: We define a list of highway tags we want.
2: We build a types object for later checking.
3: We perform the check on each object: is the type in our list?
4: If yes: add the feature to the table!

If this table was all we wanted, then we’d be all set!

Run the import

import-roads.sh

osm2pgsql /app/data/city-extract.osm.pbf \
  -d postgres://gis:gis@database:5432/osm \
  -O flex -S /app/osm2pgsql-scenarios/roads.lua

In this example, the -O flex option tells the program what output type we’re using, and -S [filename] gives it the Lua file that defines the output.

Your Turn!

Run an import of roads using the Lua script above. (Note that it’s split across two code blocks here, but the full file can be found [in the workshop])

Change the filters to get different highway features
Include some additional fields. Is the highway a bridge? A tunnel? How wide is it?

Example: Points of Interest

Let’s say we want a table of any shop or amenity points.

Here’s where some familiarity with OSM comes in handy. Depending on the regional conventions or the individual mapper’s preferences, a “point” of interest like an amenity or a shop may actually be mapped as an area! So, what do we do?

We won’t get into which method is “more correct”. The point is, we can account for this!

Easy: we tell our function to handle things a little differently based on the geometry type. Here’s the code:

pois.lua

local pois = osm2pgsql.define_table({ 
  name = 'pois',
  ids = { type = 'any', type_column = 'osm_type', id_column = 'osm_id' },
  columns = {
    { column = 'name' },
    { column = 'class', not_null = true },
    { column = 'subclass' },
    { column = 'geom', type = 'point', not_null = true },
  },
  indexes = {
    { column = { 'name' }, method = 'btree' },
    { column = { 'geom' }, method = 'gist' }
  }
})

function process_poi(object, geom)
    local a = {
        name = object.tags.name,
        geom = geom,
        tags = object.tags
    }

    if object.tags.amenity then
        a.class = 'amenity'
        a.subclass = object.tags.amenity
    elseif object.tags.shop then
        a.class = 'shop'
        a.subclass = object.tags.shop
    else
        return
    end

    pois:insert(a)
end

function osm2pgsql.process_node(object)
    process_poi(object, object:as_point())
end

function osm2pgsql.process_way(object)
    if object.is_closed then
        process_poi(object, object:as_polygon():centroid())
    end
end

1: Our process_poi function, which takes an object and its geometry.
2: We process all our nodes, passing them directly into the poi function.
3: For ways, we check: are they closed? This usually means an area. If so, take the centroid of the area and pass that into the poi function.

Indexes

osm2pgsql will automatically add an index to the geometry column, but we can also tell the program to create other indices for us. We won’t get into the weeds on database indexes, but if you anticipate searching through one of your field (name, address, that sort of thing), it’s not a bad idea.

Run the import

import-pois.sh

osm2pgsql /app/data/city-extract.osm.pbf \
  -d postgres://gis:gis@database:5432/osm \
  -O flex -S /app/osm2pgsql-scenarios/pois.lua

Note the osm_type column, which shows W for ways, and N for nodes. If we only took one or the other, we’d be missing out on a lot features! Now no matter how the features are mapped, we’ll get a nice little point feature in our database.

Your Turn!

Run the POI import.

What about leisure and tourism? Add some points of interest of other types.
Are there any non-closed ways that we’d want the centroid of? How would you adjust the Lua script to catch those?

Example: Numbered Road Routes

One more basic example. We want numbered road routes in our map. In OpenStreetMap, these are mapped as relations.

Brief Note on Relations

Relations in OSM are all over the place, but each has:

A type, such as a route or a boundary
One to many members, being other nodes, ways, or relations
Each member may or may not have a role

How a relation works, the kinds of members and roles you could expect, and what any of it means can totally vary by relation type. Routes are easy enough, though! Route relation members are simply the road segments that are part of the route.

The thing with routes being made up of road segments is that your single Route 1 might be represented by hundreds of separate features. And any of those features might also be part of other routes, too! You either end up duplicating multi-route segments, or you vastly complicate the data being stored on your roads.

The approach I like for this is to merge the members into a single multiline geometry, per route.

routes.lua

local ways = osm2pgsql.define_table({
    name = 'routes',
    ids = { type = 'way', id_column = 'osm_id' },
    columns = {
        { column = 'id', sql_type = 'serial', create_only = true },
        { column = 'ref' },
        { column = 'name' },
        { column = 'network' },
        { column = 'geom', type = 'multilinestring', not_null = true}
    }
})

function osm2pgsql.process_relation(object)
    if object.tags.route == 'road' then
        ways:insert({
            tags = object.tags,
            geom = object:as_multilinestring():line_merge(),
            ref = object.tags.ref,
            name = object.tags.name,
            network = object.tags.network
        })
    end
end

1: Filter for road routes
2: Merge all ways together

Run the import

import-routes.sh

osm2pgsql /app/data/city-extract.osm.pbf \
  -d postgres://gis:gis@database:5432/osm \
  -O flex -S /app/osm2pgsql-scenarios/routes.lua

Your Turn!

Import the road routes table.

Adjust the lua file to include public transit routes. What fields should you include?

Multiple Tables

Let’s take things up a notch. Rarely will you be going to all this trouble just to get a single table. Here’s a bigger Lua file that gets us everything we need to make a decent map of a park or forest preserve.

See the code

parks.lua

local parks = osm2pgsql.define_table({
    name = 'parks',
    ids = { type = 'any', type_column = 'osm_type', id_column = 'osm_id' },
    columns = {
        { column = 'name' },
        { column = 'type' },
        { column = 'operator' },
        { column = 'access' },
        { column = 'housenumber' },
        { column = 'street' },
        { column = 'city' },
        { column = 'postcode' },
        { column = 'geom', type = 'polygon', not_null = true }
    }
})

local trails = osm2pgsql.define_table({
    name = 'trails',
    ids = { type = 'any', type_column = 'osm_type', id_column = 'osm_id' },
    columns = {
        { column = 'access' },
        { column = 'type' },
        { column = 'name' },
        { column = 'foot' },
        { column = 'bicycle' },
        { column = 'horse' },
        { column = 'surface' },
        { column = 'geom', type = 'linestring', not_null = true }
    }
})

local natural = osm2pgsql.define_table({
    name = 'natural_areas',
    ids = { type = 'any', type_column = 'osm_type', id_column = 'osm_id' },
    columns = {
        { column = 'name' },
        { column = 'type' },
        { column = 'geom', type = 'polygon', not_null = true }
    }
})

local buildings = osm2pgsql.define_table({
    name = 'buildings',
    ids = { type = 'any', type_column = 'osm_type', id_column = 'osm_id' },
    columns = {
        { column = 'name' },
        { column = 'geom', type = 'polygon', not_null = true }
    }
})

local amenity_areas = osm2pgsql.define_table({
    name = 'amenity_areas',
    ids = { type = 'any', type_column = 'osm_type', id_column = 'osm_id' },
    columns = {
        { column = 'name' },
        { column = 'type' },
        { column = 'access' },
        { column = 'geom', type = 'polygon', not_null = true }
    }
})

local amenity_points = osm2pgsql.define_table({
    name = 'amenity_points',
    ids = { type = 'any', type_column = 'osm_type', id_column = 'osm_id' },
    columns = {
        { column = 'name' },
        { column = 'type' },
        { column = 'access' },
        { column = 'geom', type = 'point', not_null = true }
    }
})

local highway_types = {
    'path',
    'footway',
    'bridleway',
    'track',
    'cycleway'
}

local types = {}
for _, k in ipairs(highway_types) do
  types[k] = 1
end

function osm2pgsql.process_way(object)

    -- trails
    if types[object.tags.highway] then
        trails:insert({
            name = object.tags.name,
            type = object.tags.highway,
            access = object.tags.access,
            foot = object.tags.foot,
            bicycle = object.tags.bicycle,
            horse = object.tags.horse,
            surface = object.tags.surface,
            geom = object:as_linestring()
        })
    end

    -- parks and forest preserves
    if object.tags.leisure == 'nature_reserve' or object.tags.leisure == 'park' then
        parks:insert({
            name = object.tags.name,
            type = object.tags.leisure,
            operator = object.tags.operator,
            access = object.tags.access,
            housenumber = object.tags['addr:housenumber'],
            street = object.tags['addr:street'],
            city = object.tags['addr:city'],
            postcode = object.tags['addr:postcode'],
            geom = object:as_polygon()
        })
    -- amenity areas not caught by above
    elseif object.tags.amenity or object.tags.leisure then
        local a = {
            name = object.tags.name,
            access = object.tags.access,
            geom = object:as_polygon()
        }

        if object.tags.amenity then
            a.type = object.tags.amenity
        else
            a.type = object.tags.leisure
        end

        amenity_areas:insert(a)

        if object.is_closed then
            a.geom = object:as_polygon():centroid()
            amenity_points:insert(a)
        end

    end

    -- natural areas
    if object.tags.natural then
        natural:insert({
            name = object.tags.name,
            type = object.tags.natural,
            geom = object:as_polygon()
        })
    end

    -- buildings
    if object.tags.building then
        buildings:insert({
            name = object.tags.name,
            geom = object:as_polygon()
        })
    end

end

-- amenity points
function osm2pgsql.process_node(object)
    if object.tags.amenity or object.tags.leisure then
        local a = {
            name = object.tags.name,
            access = object.tags.access,
            geom = object:as_point()
        }

        if object.tags.amenity then
            a.type = object.tags.amenity
        else
            a.type = object.tags.leisure
        end

        amenity_points:insert(a)
    end
end

Run the import

import-parks.sh

osm2pgsql /app/data/city-extract.osm.pbf \
  -d postgres://gis:gis@database:5432/osm \
  -O flex -S /app/osm2pgsql-scenarios/parks.lua

Your Turn!

Using the Lua file above as a starting point, or starting from scratch, import multiple tables.

For a quick guide to common OSM features and how they are tagged, check out the Map Features page of the OSM Wiki.

Catch-All with `jsonb` (or `hstore`)

Now we’re going to go in a completely different direction. One of my favorite things about OSM is the open-ended schema. Most anything you can think of can be mapped or tagged in some way, and I am constantly surprised by what I find in the data.

All the Tags

I might not know what tags I want included in my features right away. I could working on my map, then realize later that I should have included information about, say, a business’s email address. I’d have to go back to my Lua file, make the changes, and reimport, just to add a couple more tags. So why not take them all? It’s as simple as including a jsonb column (or hstore, if that’s your jam), then throwing the entire object.tags into it.

-- in the define_table section:
  columns = {
    { column = 'something_specific' },
    { column = 'other_tags', type = 'jsonb' },
  }

-- in the processing function:
  table:insert({
    something_specific = object.tags.some_tag
    other_tags = object.tags
  })

All the Features

Let’s take this even further. What if I don’t know which features I want to add? Maybe my map would look just perfect if I had waterslides as linear features, but I don’t have a line table to put them in. Now we’re not just adding new fields, but entire tables. Not the end of the world, and working with small extracts of OSM, imports go fast enough that re-running it is trivial.

But still, wouldn’t it be nice to have all the data at my fingertips? Let’s do it!

See the code

all.lua

local nodes = osm2pgsql.define_table({
    name = 'all_nodes',
    ids = { type = 'any', id_column = 'osm_id' },
    columns = {
        { column = 'id', sql_type = 'serial', create_only = true },
        { column = 'tags', type = 'jsonb' },
        { column = 'geom', type = 'point', not_null = true }
    }
})

local ways = osm2pgsql.define_table({
    name = 'all_ways',
    ids = { type = 'way', id_column = 'osm_id' },
    columns = {
        { column = 'id', sql_type = 'serial', create_only = true },
        { column = 'tags', type = 'jsonb' },
        { column = 'geom', type = 'multilinestring', not_null = true}
    }
})

local areas = osm2pgsql.define_table({
    name = 'all_areas',
    ids = { type = 'any', id_column = 'osm_id' },
    columns = {
        { column = 'id', sql_type = 'serial', create_only = true },
        { column = 'tags', type = 'jsonb' },
        { column = 'geom', type = 'multipolygon', not_null = true}
    }
})

function osm2pgsql.process_node(object)
    nodes:insert({
        tags = object.tags,
        geom = object:as_point()
    })
end

function osm2pgsql.process_way(object)
    if object.is_closed then
        areas:insert({
            tags = object.tags,
            geom = object:as_polygon()
        })
        
        -- get centroid of any closed way
        nodes:insert({
            tags = object.tags,
            geom = object:as_polygon():centroid()
        })
    end

    ways:insert({
        tags = object.tags,
        geom = object:as_linestring()
    })
end


function osm2pgsql.process_relation(object)
    if object.tags.type == 'route' or object.tags.type == 'waterway' then
        ways:insert({
            tags = object.tags,
            geom = object:as_multilinestring():line_merge()
        })
    end

    if object.tags.type == 'multipolygon' or object.tags.type == 'boundary' then
        areas:insert({
            tags = object.tags,
            geom = object:as_multipolygon()
        })
    end
end

Run the import

import-all.sh

osm2pgsql /app/data/city-extract.osm.pbf \
  -d postgres://gis:gis@database:5432/osm \
  -O flex -S /app/osm2pgsql-scenarios/all.lua

Your Turn!

Import EVERYTHING.

My example only included road route, waterway, and boundary relations. Add some other types to the mix.

Okay, cool, but… how are you supposed to use this? We’ll get to that later.

Using the Data: Other Programs

Before we play with the data in QGIS, we should mention a few other ways you can use your cool new database full of stuff.

If you’re using the compose file from the workshop repo, these other tools are included.

Serving Vector Tiles on the Fly

PostGIS has a really great function, ST_AsMVT, which returns the response of your query as a series of vector tiles. Here are a couple of tools that build onto this capability.

Martin

Martin is a tile server able to generate and serve vector tiles on the fly from large PostGIS databases, PMTiles (local or remote), and MBTiles files, allowing multiple tile sources to be dynamically combined into one. Martin optimizes for speed and heavy traffic, and is written in Rust.

See the Martin demo site

Composite sources are fun, you can basically pull every table you’ve got into a single vector tile “service”. Can work with file-based vector tiles as well as a database.

pg_tileserv

A PostGIS-only tile server in Go. Strip away all the other requirements, it just has to take in HTTP tile requests and form and execute SQL. In a sincere act of flattery, the API looks a lot like that of the Martin tile server.

See the pg_tileserv guide

Personally, I really like the built-in catalog. It’s otherwise very similar to Martin.

Serving Features

There’s a lot you can do with vector tiles these days, and you may never need to serve “real” features to your users. But if you did, there are lots of ways to do it. Basically any enterprise GIS system (Esri stuff, GeoServer, etc.,) can connect to a PostGIS database.

Since we’re already adjacent to it, looking at pg_tileserve, I did want to specifically mention one tool:

pg_featureserv

See the pg_featureserv guide

Similarly lightweight and straightforward. If all you need is to make your feature services available via a REST endpoint, it’s worth your time to check it out.

Using the Data: Desktop GIS

QGIS

Connecting to the Database

In the Browser panel, right-click on the PostgreSQL section and create a new connection. You can also add a connection from the Data Source Manager.

Your Turn!

Connect QGIS to your PostGIS Database!

You should see your imported tables in the public schema. (That is, if you didn’t specify some other schema in your imports!)

Add a Layer Directly

If your config is already pretty detailed and split into specific tables, you can open up the table list in the Browser Panel and add any layer to your project by right-clicking or just dragging and dropping.

Creating a Query Layer

Right-click the database connection and choose Execute SQL. In the window that opens, write any valid SQL expression that returns a unique ID and geometry, and you can choose to load it as a layer.

Using `hstore` and `jsonb` columns directly with `map_get`

In QGIS, we can actually add our tags field straight to the layer itself, and we will still be able to work with it.

In order to pull out a specific tag from this outside of SQL, QGIS’s expression editor has the function map_get. The “map” in map_get refers to what in other languages is a dictionary.

You can see in the Preview that a name is being pulled from the “tags” column, even though in the example pictured, that tag was not queried to its own column.

Overusing map_get isn’t going to perform quite as well as writing the query layer to include those tags, but it’s another way in which the data can remain extremely flexible as you work with it.

Your Turn!

Add one of your layers that has a tags column to the map. Use map_get to pull an attribute into a labeling expression.

Writing Queries w/ `hstore` and `jsonb`

When querying a “catch-all” column, the SQL is very simple. Just use -> for hstore and ->> for jsonb.

To be clear, the single arrow will work for jsonb as well, but that will return "some text" complete with the quotation marks, which is not usually what we want.

Sample Queries

Buildings

select
  osm_id,
  geom,
  tags ->> 'name' as name,
  tags
from osm2pgsql_areas
where tags ->> 'building' is not null

Landuse

select
    osm_id,
    geom,
    coalesce(tags ->> 'natural', tags ->> 'landuse') type,
    tags
from osm2pgsql_areas
where coalesce(tags ->> 'natural', tags ->> 'landuse') is not null

Tip

With a PostGIS DB, you have full access to a number of spatial SQL functions. This can be especially useful for putting certain spatial analysis steps into the query itself, avoiding static intermediate outputs.

Fifty Feet From a Bathroom

select
  osm_id,
  st_buffer(geom, 50) as geom_buff
from city_all_nodes
where tags ->> 'amenity' = 'toilets'

The sky’s the limit! This isn’t a spatial SQL workshop, but try things like:

Intersecting two layers
Checking for features within a certain distance using ST_DWithin
Aggregating attributes from nearby features
Clipping features to a political boundary

Your Turn!

Write a SQL query layer to return specific columns from an hstore or jsonb column.
Write a SQL query layer to modify the geometry in some way.

ArcGIS Pro

I know, it’s not FOSS, but it’s worth mentioning, in case you or someone you know wants to use OSM data and they’re working in Pro.

Just add a query layer and use the same SQL as before.

Caution

ArcGIS Pro does not like hstore or jsonb columns, and won’t know what to do if you try to bring it in as-is. You will need to use the ->> accessor to grab any specific tags you’re looking for.

Okay, but what am I using these layers for?

Once the layers are in your map, whether they are carefully defined in your import config or simply a SQL query, they can be used for anything that any other layer can be used for in Q or in ArcGIS. Probably. I haven’t tested all use cases yet. But here are a few that I have:

Publish to AGOL / Portal as a Hosted Feature Layer or Map Service
Use as GeoProcessing input
Build a street / address / point of interest locator
Use to create vector tiles
Just make really nice print maps

Including Metadata

You may not need the full metadata of OSM objects, but it can often be useful for quality control and validation purposes. “Metadata” refers to information like what version an object is, the user who last edited it, and what changeset was responsible. Full-metadata extracts are available from Geofabrik.

At Kendall County, we do just that. You can find our process in this repository, and you can make any changes you like to adapt it to your particular needs.

A note on metadata extracts

Per the Geofabrik website:

[Full metadata extract file] usage is governed by data protection regulations in the European Union. These regulations apply even to data processing that happens outside the European Union because some people whose data is contained in this files live in the European Union. These files may only be used for OpenStreetMap interal purposes, e.g. quality assurance. You must ensure that derived databases and works based on these files are only accessible to OpenStreetMap contributors.

A Final Note

Whatever you do with your data, please, please, please:

Don’t forget the attribution!

Attribution Guidelines

Overview

Goals

Associated Repo

Requirements

Data

Software

Optional

Software Alternatives: Docker / Podman

Run Your Own Containers

Isaac Boates’ Docker Images

Compose File

Sending Commands to your Docker Image

GUI

CLI

Create a geographic subset: osmium

Your AOI spans two (or more) files!

Importing the data: osm2pgsql

Default Output

Flex Output

Example: Traffic Roads

The table

Column attributes

Processing

Run the import

Example: Points of Interest

Indexes

Run the import

Example: Numbered Road Routes

Run the import

Multiple Tables

Run the import

Catch-All with jsonb (or hstore)

All the Tags

All the Features

Run the import

Using the Data: Other Programs

Serving Vector Tiles on the Fly

Martin

pg_tileserv

Serving Features

pg_featureserv

Using the Data: Desktop GIS

QGIS

Connecting to the Database

Add a Layer Directly

Creating a Query Layer

Using hstore and jsonb columns directly with map_get

Writing Queries w/ hstore and jsonb

Sample Queries

ArcGIS Pro

Okay, but what am I using these layers for?

Including Metadata

A Final Note

Create a geographic subset: `osmium`

Importing the data: `osm2pgsql`

Catch-All with `jsonb` (or `hstore`)

Using `hstore` and `jsonb` columns directly with `map_get`

Writing Queries w/ `hstore` and `jsonb`