Skip to content

Fault injection overview

A fleet of 50,000 devices is never perfectly healthy. Some refuse logins, some drop mid-transfer, some crawl, some return garbage. rcfg-sim can inject these faults on purpose so you can test how your tooling copes.

Two flags control fault injection:

FlagDefaultPurpose
--fault-types"" (none)Comma-separated list of faults to enable
--fault-rate0.0Probability [0,1] that an enabled fault fires per event
Terminal window
./bin/rcfg-sim \
--listen-ip 127.0.0.1 --port-start 12000 --port-count 100 \
--manifest /tmp/rcfg-test/manifest.csv \
--host-key /tmp/rcfg-test/hostkey \
--fault-rate 0.05 \
--fault-types "auth_fail,slow_response"

This enables two fault types, each with a 5% chance of firing on a relevant event. With the default --fault-rate 0.0, no faults fire regardless of --fault-types — fault injection is off by default and has negligible overhead when disabled.

Faults are rolled per relevant event (an auth attempt, a command, a config stream) using a per-session RNG seeded from the session ID plus a timestamp. The result:

  • Deterministic within a session — a given session behaves consistently.
  • Random across the fleet — different sessions hit different faults, so aggregate behaviour is realistically varied.

Every fault activation increments rcfgsim_faults_injected_total{type="..."}, so you can confirm faults are firing at the rate you expect and correlate them with your tool’s error handling. See the metrics reference.