Skip to content

Fault injection examples

Recipes for common resilience tests. All assume a small loopback fleet from the Quickstart.

Model a fleet where ~5% of logins fail. Good for verifying retry/backoff and that genuinely reachable devices aren’t marked dead after one bad attempt.

Terminal window
./bin/rcfg-sim \
--listen-ip 127.0.0.1 --port-start 12000 --port-count 100 \
--manifest /tmp/rcfg-test/manifest.csv --host-key /tmp/rcfg-test/hostkey \
--fault-rate 0.05 --fault-types "auth_fail"

Test partial-read handling and integrity checks: configs are cut off after 20–40% with a TCP reset.

Terminal window
./bin/rcfg-sim \
--listen-ip 127.0.0.1 --port-start 12000 --port-count 100 \
--manifest /tmp/rcfg-test/manifest.csv --host-key /tmp/rcfg-test/hostkey \
--fault-rate 0.1 --fault-types "disconnect_mid"

Combine a wide base delay with the slow-response multiplier to push timeout and concurrency behaviour.

Terminal window
./bin/rcfg-sim \
--listen-ip 127.0.0.1 --port-start 12000 --port-count 100 \
--manifest /tmp/rcfg-test/manifest.csv --host-key /tmp/rcfg-test/hostkey \
--response-delay-ms-min 100 --response-delay-ms-max 1000 \
--fault-rate 0.05 --fault-types "slow_response"

Verify your parser and validation survive corrupted output.

Terminal window
./bin/rcfg-sim \
--listen-ip 127.0.0.1 --port-start 12000 --port-count 100 \
--manifest /tmp/rcfg-test/manifest.csv --host-key /tmp/rcfg-test/hostkey \
--fault-rate 0.1 --fault-types "malformed"

Everything on, at a high rate — a stress test for end-to-end resilience.

Terminal window
./bin/rcfg-sim \
--listen-ip 127.0.0.1 --port-start 12000 --port-count 100 \
--manifest /tmp/rcfg-test/manifest.csv --host-key /tmp/rcfg-test/hostkey \
--fault-rate 0.2 --fault-types "auth_fail,disconnect_mid,slow_response,malformed"

After a run, check the counters:

Terminal window
curl -s http://127.0.0.1:9100/metrics | grep rcfgsim_faults_injected_total

The per-type counts should track your configured rate. See the metrics reference and the broader load-test scenarios.