Configuring HAProxy for WMS Load Balancing
Deploying a resilient Web Map Service (WMS) architecture requires precise traffic distribution, particularly when scaling open-source rendering engines such as GeoServer, MapServer, or QGIS Server across stateless compute nodes. HAProxy serves as the critical ingress layer, but default HTTP load balancing heuristics frequently fail under geospatial workloads due to asymmetric payload sizes, long-running tile generation, and strict OGC compliance requirements. Proper configuration demands explicit attention to request routing, backend health validation, and timeout calibration. This guide outlines production-safe patterns for Infrastructure Orchestration & Configuration Management workflows targeting geospatial platform scaling.
The load-balancing topology below shows the HAProxy frontend distributing WMS traffic across renderer nodes, with L7 health checks driving each node in and out of rotation.
flowchart LR
Client["WMS clients"] --> FE["HAProxy frontend — TLS, ACLs"]
FE --> LB{"balance roundrobin"}
subgraph BE [wms_renderers backend]
N1["node1 10.0.1.10:8080"]
N2["node2 10.0.1.11:8080"]
end
LB --> N1
LB --> N2
BE -. "httpchk GetCapabilities, expect 200" .-> FE
Request Routing and Load Distribution
The foundational routing logic must distinguish between lightweight metadata requests and computationally intensive raster operations. WMS GetCapabilities and GetLegendGraphic calls are highly cacheable and should be routed using roundrobin or static-rr to distribute baseline load evenly across the cluster. Conversely, GetMap and GetFeatureInfo operations trigger dynamic rendering pipelines that consume significant CPU, memory, and I/O bandwidth.
When these requests exceed standard HTTP keep-alive windows, HAProxy’s default timeout http-request and timeout server values will prematurely terminate connections. Tuning these parameters to align with your rendering engine’s maximum execution time prevents 504 Gateway Timeout responses during complex spatial joins or large-extent rasterization. For comprehensive architectural patterns, refer to the Reverse Proxy Configuration for WMS/WFS documentation.
defaults
mode http
timeout connect 5s
timeout client 30s
timeout http-request 10s
timeout http-keep-alive 10s
backend wms_renderers
balance roundrobin
timeout server 120s
timeout queue 30s
server node1 10.0.1.10:8080 check
server node2 10.0.1.11:8080 check
Geospatial-Aware Backend Health Validation
Backend health checks require geospatial-specific validation rather than generic TCP probes. A simple port check cannot verify whether a rendering engine has exhausted its JVM heap, locked a shapefile index, or encountered a GDAL projection cache failure. Implementing HTTP-based health checks against a lightweight GetCapabilities endpoint ensures HAProxy only routes traffic to nodes capable of fulfilling OGC requests.
The check interval should be calibrated to avoid overwhelming the backend during high-concurrency periods. A typical configuration uses inter 10s fall 3 rise 2 alongside a custom option httpchk directive that validates the HTTP 200 response. Advanced deployments can parse the XML service metadata for version compliance or specific capability flags, as outlined in the OGC Web Map Service Implementation Specification.
backend wms_renderers
option httpchk GET /geoserver/wms?service=WMS&version=1.3.0&request=GetCapabilities
http-check expect status 200
server node1 10.0.1.10:8080 check inter 10s fall 3 rise 2
server node2 10.0.1.11:8080 check inter 10s fall 3 rise 2
Session Persistence and Caching Integration
Session persistence is rarely required for standard WMS deployments, as the protocol is inherently stateless. However, when integrating with downstream caching layers like GeoWebCache or TileCache, source IP hashing (balance source) may be necessary to prevent cache fragmentation and ensure consistent tile delivery.
Be aware that NAT environments or cloud load balancer proxies will obscure client IPs, breaking source hashing unless option forwardfor and X-Forwarded-For header rewriting are explicitly configured in the frontend listener. Additionally, ensure your caching layer respects the Cache-Control headers emitted by the rendering backend to avoid serving stale spatial data during rapid dataset updates.
frontend wms_frontend
bind *:80
option forwardfor
default_backend wms_renderers
backend wms_cache_aware
balance source
hash-type consistent
server cache1 10.0.2.10:8080 check
server cache2 10.0.2.11:8080 check
Multi-Tenant Isolation via ACLs
In multi-tenant agency deployments, Access Control Lists (ACLs) should route traffic based on req.hdr(Host) or URI path prefixes to isolate production, staging, and internal data services without deploying separate proxy instances. This approach reduces operational overhead while maintaining strict network segmentation.
frontend wms_multi_tenant
bind *:443 ssl crt /etc/haproxy/certs/
acl is_prod path_beg /prod/
acl is_staging path_beg /staging/
acl is_internal hdr(host) -i internal.maps.agency.gov
use_backend prod_wms if is_prod
use_backend staging_wms if is_staging
use_backend internal_wms if is_internal
Observability, Logging, and Troubleshooting
Debugging HAProxy under geospatial load requires structured log analysis and metric correlation. Enable option httplog and option log-health-checks to capture backend response codes, queue times, and connection states. Correlate these logs with application-level metrics (e.g., JVM garbage collection pauses, GDAL cache hit rates, or PostgreSQL query execution times) to identify bottlenecks.
When troubleshooting intermittent 502 or 504 errors, inspect the HAProxy srv_queue and retries counters. High queue depths typically indicate backend saturation, while frequent retries suggest transient network instability or rendering engine crashes. For detailed log format specifications and statistical counters, consult the official HAProxy Configuration Manual. Additionally, cross-reference proxy logs with the GeoServer Administration Guide to correlate proxy-level timeouts with server-side thread pool exhaustion.
Implement centralized log aggregation using structured formats (JSON or RFC 5424) to enable rapid filtering by HTTP_METHOD, REQUEST_URI, and BACKEND_RESPONSE_TIME. This telemetry is essential for capacity planning and for identifying spatial query patterns that consistently degrade rendering performance.