162 Metrics and Prometheus Design

162 Metrics and Prometheus Design

Metrics should answer operational questions quickly: Is traffic normal? Are errors rising? Is latency degrading?

RED Baseline

Rate: requests/sec
Errors: failed requests/sec
Duration: latency distribution (histogram)

Label Strategy

Labels are powerful and dangerous. High-cardinality labels can destabilize metric systems.

Prefer bounded dimensions such as:

  • method
  • route template
  • status class

Avoid unbounded dimensions such as user IDs or raw URLs.

SRE Connection

Useful metrics are tied to alerts and runbooks. Instrumentation without response policy adds noise, not reliability.