151 Service Discovery and Health Checks
151 Service Discovery and Health Checks
Discovery systems answer two questions: where instances are, and whether they are currently safe to route to.
System Shape
instance -> register/heartbeat -> registry
router -> query endpoints -> select healthy target
Health Semantics
Use distinct checks:
- Liveness: process is alive.
- Readiness: instance can serve traffic.
Routing on liveness alone creates avoidable errors during warmup, migrations, and degraded dependencies.
Expiration Strategy
TTL-based registration prevents stale endpoints from receiving traffic when instances crash or disconnect unexpectedly.
Reliability Principle
Discovery quality directly determines tail latency and error rate. Treat registry correctness as critical infrastructure.