Metrics & observability overview
rcfg-sim is built to be observed under load. Each server instance exposes Prometheus metrics and a health endpoint over HTTP.
The endpoints
Section titled “The endpoints”--metrics-addr (default 0.0.0.0:9100) serves:
GET /metrics— Prometheus exposition formatGET /healthz— liveness check
Set --metrics-addr "" to disable the HTTP server.
curl -s http://127.0.0.1:9100/metrics | grep -E '^rcfgsim_'curl -s http://127.0.0.1:9100/healthzBounded cardinality by design
Section titled “Bounded cardinality by design”Every metric label is a closed set — pre-registered at startup, never derived from raw
user input. A client typing a thousand distinct garbage commands does not create a
thousand label values: it all rolls up under CmdUnknown. This keeps Prometheus healthy even
when the simulator is abused, and it’s asserted by a cardinality test in the project (don’t
add new labels casually — labels are part of the public API).
Because series are pre-registered at zero, the full metric set appears on the very first scrape — so you can alert on absence of traffic, not just presence.
What to watch under load
Section titled “What to watch under load”rcfgsim_active_sessions— how close you are to--max-concurrent-sessions.rcfgsim_sessions_total{result}— throughput and error mix (okvsauth_fail/disconnect/error).rcfgsim_command_duration_seconds— per-command latency, including delays andslow_responsefaults.rcfgsim_bytes_sent_total— aggregate throughput.rcfgsim_faults_injected_total{type}— confirm faults fire at the expected rate.
The full list, with labels and histogram buckets, is in the metrics reference. Ready-to-paste queries are in Grafana queries.
Runtime and process metrics
Section titled “Runtime and process metrics”Alongside the simulator metrics, the standard Go runtime collectors (goroutines, memory, GC) and process collectors (CPU, open file descriptors) are registered — useful for watching the host’s resource envelope as you scale up.