Configuring HAProxy for WMS Load Balancing

This walkthrough builds a production-safe HAProxy configuration that distributes Web Map Service (WMS) traffic across stateless rendering nodes while respecting OGC compliance, asymmetric payload sizes, and long-running tile generation.

This page is a hands-on procedure under Reverse Proxy Configuration for WMS/WFS, part of the broader Infrastructure Orchestration & Configuration Management practice. Where the parent page covers Nginx, HAProxy, and Traefik patterns at a conceptual level, this guide drills into the exact HAProxy directives that keep a GeoServer, MapServer, or QGIS Server fleet balanced and healthy.

The load-balancing topology below shows the HAProxy frontend distributing WMS traffic across renderer nodes, with L7 health checks driving each node in and out of rotation.

Prerequisites

Confirm the following before editing haproxy.cfg. A mismatch here is the most common reason a balanced WMS fleet still returns 504 Gateway Timeout under load.

HAProxy 2.4 or newer (LTS). The http-check expect and http-request keyword syntax below assumes the 2.x parser; 1.8 uses the deprecated httpchk expect form.
At least two stateless renderer nodes — GeoServer 2.22+, MapServer 8.x, or QGIS Server 3.28+ — reachable on a known port (the examples use 8080).
A lightweight GetCapabilities endpoint on each node that returns HTTP 200 without authentication, used as the health probe target.
Write access to /etc/haproxy/haproxy.cfg and permission to reload the service (systemctl reload haproxy).
Renderer maximum execution time measured under realistic load — the worst-case GetMap render time drives the timeout server value in Step 1.
TLS certificate material in /etc/haproxy/certs/ if you terminate HTTPS at the proxy (Step 5).

Step-by-step Implementation

Step 1 — Set safe defaults and WMS-aware timeouts

WMS GetCapabilities and GetLegendGraphic calls are small and cacheable, but GetMap and GetFeatureInfo trigger dynamic rendering pipelines that consume significant CPU, memory, and I/O. HAProxy’s default timeout server will terminate a complex spatial join or large-extent rasterization before it finishes. Set timeout server to align with the renderer’s measured maximum execution time so slow tiles complete instead of returning 504.

defaults
    mode http
    timeout connect 5s
    timeout client  30s
    timeout server  120s          # match the renderer's worst-case GetMap time
    timeout queue   30s           # cap how long a request waits for a free server
    timeout http-request    10s   # guard against slow-loris on the metadata path
    timeout http-keep-alive 10s
    option httplog
    option dontlognull

Step 2 — Declare the renderer backend and distribution algorithm

Route baseline traffic with roundrobin (or static-rr when nodes are identically sized) so cacheable metadata requests spread evenly. The check keyword arms the health probe configured in the next step.

backend wms_renderers
    balance roundrobin
    server node1 10.0.1.10:8080 check
    server node2 10.0.1.11:8080 check

Step 3 — Add geospatial-aware health validation

A TCP port check cannot tell whether a renderer has exhausted its JVM heap, locked a shapefile index, or hit a GDAL projection-cache failure. Probe a real GetCapabilities request and require an HTTP 200, so HAProxy only routes to nodes that can actually answer OGC calls. Use inter 10s fall 3 rise 2 to avoid flapping during high-concurrency spikes.

backend wms_renderers
    balance roundrobin
    option httpchk
    http-check send meth GET uri /geoserver/wms?service=WMS&version=1.3.0&request=GetCapabilities
    http-check expect status 200
    server node1 10.0.1.10:8080 check inter 10s fall 3 rise 2
    server node2 10.0.1.11:8080 check inter 10s fall 3 rise 2

Step 4 — Preserve client IPs and integrate cache affinity

WMS is stateless, so session persistence is rarely needed for rendering. But when a downstream tile cache such as GeoWebCache sits behind the proxy, source-IP hashing keeps a given client pinned to one cache node and prevents tile fragmentation. Source hashing breaks silently behind NAT or a cloud load balancer unless you forward the real client IP, so enable option forwardfor and confirm the renderer trusts X-Forwarded-For.

frontend wms_frontend
    bind *:80
    option forwardfor             # inject X-Forwarded-For for downstream logging/affinity
    default_backend wms_renderers

backend wms_cache_aware
    balance source                # pin a client to one cache node
    hash-type consistent          # minimise reshuffling when a node leaves rotation
    server cache1 10.0.2.10:8080 check
    server cache2 10.0.2.11:8080 check

Ensure the cache layer honours the Cache-Control headers the renderer emits, so a rapid dataset update does not keep serving stale spatial data. Connection-pooling concerns for the database tier behind these renderers are covered in Optimizing PostgreSQL/PostGIS Connection Limits.

Step 5 — Isolate tenants with ACLs at the frontend

Multi-tenant agency deployments can share one proxy instance and still keep production, staging, and internal services apart. Route on a path prefix or the Host header rather than standing up separate proxies, which cuts operational overhead while preserving segmentation.

frontend wms_multi_tenant
    bind *:443 ssl crt /etc/haproxy/certs/
    acl is_prod     path_beg /prod/
    acl is_staging  path_beg /staging/
    acl is_internal hdr(host) -i internal.maps.agency.gov

    use_backend prod_wms     if is_prod
    use_backend staging_wms  if is_staging
    use_backend internal_wms if is_internal
    default_backend wms_renderers

Step 6 — Turn on observability for the rendering path

Enable health-check logging and the stats socket so you can correlate proxy behaviour with renderer internals. The runtime socket lets you query live queue depth and retry counters without a reload, which is essential when diagnosing intermittent timeouts. The fleet topology behind this proxy — and how to keep renderer containers healthy — is detailed in Containerizing TileServer GL for High Availability.

global
    log /dev/log local0
    stats socket /run/haproxy/admin.sock mode 660 level admin

backend wms_renderers
    option log-health-checks      # log every transition in/out of rotation
    # ...server lines from Step 3...

Verification

Validate the configuration before reloading, then confirm the live state once HAProxy is running.

# 1. Syntax-check the config without applying it
haproxy -c -f /etc/haproxy/haproxy.cfg

# 2. Reload without dropping established connections
systemctl reload haproxy

# 3. Confirm both renderers are UP and taking traffic
echo "show stat" | socat stdio /run/haproxy/admin.sock | \
  cut -d, -f1,2,18,19 | column -s, -t   # proxy, server, status, weight

# 4. Drive a real GetMap through the proxy and check it returns 200 with image bytes
curl -s -o /dev/null -w "%{http_code} %{content_type} %{time_total}s\n" \
  "http://localhost/geoserver/wms?service=WMS&version=1.3.0&request=GetMap&layers=topp:states&bbox=-130,24,-66,50&width=512&height=256&srs=EPSG:4326&format=image/png"

# 5. Confirm health-check transitions are being logged
journalctl -u haproxy --since "5 min ago" | grep -i "Health check"

A healthy result shows both servers UP in the stat output, the GetMap returning 200 image/png within the timeout server budget, and no L7STS or L4CON health-check failures in the log.

Troubleshooting Matrix

Symptom	Likely cause	Fix
`504 Gateway Timeout` on `GetMap` only	`timeout server` shorter than worst-case render time	Raise `timeout server` to the renderer’s measured maximum; profile slow layers separately
All servers flap `UP`/`DOWN`	Health check hits an authenticated or heavy URL	Point `http-check send uri` at an unauthenticated `GetCapabilities`; widen `inter`, raise `fall`
`503 Service Unavailable`, no backend	Every node failed the L7 check	Inspect `journalctl -u haproxy` for `L7STS`; verify the `GetCapabilities` path and expected status
`502 Bad Gateway` bursts	Renderer process crashing or resetting connections	Check renderer logs (JVM heap, GDAL cache); inspect rising `retries` via the stats socket
Cache hit rate collapses behind NAT	`balance source` sees one proxied IP	Enable `option forwardfor` and hash on `X-Forwarded-For`, or switch to `hash-type consistent` on a header
Requests queue then time out under load	Backend saturated, `timeout queue` reached	Add renderer nodes or raise per-server `maxconn`; watch `srv_queue` depth on the stats socket
Wrong tenant served on shared proxy	ACL order or `Host`/path mismatch	Re-order `use_backend` rules; confirm `hdr(host) -i` matches exactly and `path_beg` prefixes are correct

When queue depth and retries climb together, the renderers — not the proxy — are the bottleneck; scale the backend before retuning timeouts.

Reverse Proxy Configuration for WMS/WFS — the parent guide covering Nginx, HAProxy, and Traefik patterns for OGC ingress.
Containerizing TileServer GL for High Availability — building the renderer fleet this proxy balances.
Optimizing PostgreSQL/PostGIS Connection Limits — tuning the database tier behind the renderers.

Up one level: Reverse Proxy Configuration for WMS/WFS.

Configuring HAProxy for WMS Load Balancing

Prerequisites #

Step-by-step Implementation #

Step 1 — Set safe defaults and WMS-aware timeouts #

Step 2 — Declare the renderer backend and distribution algorithm #

Step 3 — Add geospatial-aware health validation #

Step 4 — Preserve client IPs and integrate cache affinity #

Step 5 — Isolate tenants with ACLs at the frontend #

Step 6 — Turn on observability for the rendering path #

Verification #

Troubleshooting Matrix #

Related #