162 Metrics and Prometheus Design
162 Metrics and Prometheus Design
Metrics should answer operational questions quickly: Is traffic normal? Are errors rising? Is latency degrading?
RED Baseline
Rate: requests/sec
Errors: failed requests/sec
Duration: latency distribution (histogram)
Label Strategy
Labels are powerful and dangerous. High-cardinality labels can destabilize metric systems.
Prefer bounded dimensions such as:
- method
- route template
- status class
Avoid unbounded dimensions such as user IDs or raw URLs.
SRE Connection
Useful metrics are tied to alerts and runbooks. Instrumentation without response policy adds noise, not reliability.