151 Service Discovery and Health Checks

151 Service Discovery and Health Checks

Discovery systems answer two questions: where instances are, and whether they are currently safe to route to.

System Shape

instance -> register/heartbeat -> registry
router   -> query endpoints   -> select healthy target

Health Semantics

Use distinct checks:

  • Liveness: process is alive.
  • Readiness: instance can serve traffic.

Routing on liveness alone creates avoidable errors during warmup, migrations, and degraded dependencies.

Expiration Strategy

TTL-based registration prevents stale endpoints from receiving traffic when instances crash or disconnect unexpectedly.

Reliability Principle

Discovery quality directly determines tail latency and error rate. Treat registry correctness as critical infrastructure.