Load-test scenarios
A staged approach to scaling up. Each rung validates something before you commit to the next;
the repo’s TEST-SCENARIOS.md covers the full progression in depth.
The ladder
Section titled “The ladder”100 → 1,000 → 5,000 → 25,000 → 50,000smoke single IP multi-IP half fleet full fleet1. Smoke test (100 devices, loopback)
Section titled “1. Smoke test (100 devices, loopback)”Prove the toolchain end to end: generate, serve, connect, scrape. This is the Quickstart. Goal: a clean session and moving metrics.
2. Single-IP load (1,000 devices)
Section titled “2. Single-IP load (1,000 devices)”One IP, 1,000 ports. Point your collector at the range and run a full backup cycle. Watch
rcfgsim_active_sessions and the session error mix. Goal: no errors, latency within
expectation.
3. Multi-IP (5,000 devices)
Section titled “3. Multi-IP (5,000 devices)”Move to systemd instances and IP aliases — e.g. 2 IPs × 2,500 ports. This validates that your tooling handles many addresses and that per-instance drain works.
4. Half fleet (25,000 devices)
Section titled “4. Half fleet (25,000 devices)”10 IPs × 2,500 ports. Now you’re exercising scheduler fairness and burst handling: fire all backups in a tight window and watch for queueing, timeouts, and resource pressure on both sides. Introduce faults at a low rate here.
5. Full fleet (50,000 devices)
Section titled “5. Full fleet (50,000 devices)”20 IPs × 2,500 ports on the reference host (12 vCPU / 48 GB). This is the production-shaped test: a realistic size distribution, faults at a realistic rate, and a full collection cycle. Goal: characterize throughput, tail latency, and the resource envelope of the system under test.
What to measure at each rung
Section titled “What to measure at each rung”- Throughput —
rate(rcfgsim_sessions_total{result="ok"}[1m]) - Error mix — non-
okresults as a share of total - Tail latency — p95/p99 of
rcfgsim_command_duration_seconds - Concurrency headroom —
rcfgsim_active_sessionsvs your cap - Host envelope — Go/process collectors (goroutines, FDs, CPU, memory)
See Grafana queries for the exact PromQL.
Tips for clean runs
Section titled “Tips for clean runs”- Keep the same host key across restarts so clients don’t see host-key-changed warnings.
- Use a fixed
--seedso each run faces the identical fleet. - Restart instances with
systemctl restartto get a graceful drain rather than killing them.