Metrics

Prometheus series exposed by every component.

Every Krypton component emits Prometheus metrics. Three scrape targets per component (manager, control plane, gateway) on the manager’s metrics port; the sidecar exposes its own /metrics on the sidecar port (8888) per pod.

Series

Gateway

MetricTypeLabels
krypton_invocations_totalcounteragent, namespace, status
krypton_invocation_duration_secondshistogramagent, namespace
krypton_cold_starts_totalcounteragent, namespace
krypton_buffer_depthgaugeagent, namespace

Histogram buckets: 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s, 30s.

Scaler (hosted in manager)

MetricTypeLabels
krypton_scaler_decisions_totalcounteragent, namespace, direction (up/down/noop)
krypton_agent_replicas_desiredgaugeagent, namespace

Control plane

MetricTypeLabels
krypton_api_requests_totalcounterroute, method, code
krypton_api_request_duration_secondshistogramroute

route is a route template (list_agents, get_agent, …), not the raw URL, so cardinality stays bounded.

Sidecar (krypton-proxy)

MetricTypeLabels
krypton_proxy_requests_totalcounteragent, namespace, code
krypton_proxy_rejected_totalcounteragent, namespace, reason
krypton_proxy_inflightgaugeagent, namespace

reason is one of over_capacity (concurrency cap) or shutting_down.

Grafana

A starter dashboard is available at deploy/grafana/krypton-overview.json. Import it in Dashboards → New → Import, pick your Prometheus datasource for the DS_PROM variable.

Panels:

  • Invocations per second (per agent)
  • P95 invocation latency
  • Cold starts per minute
  • Desired replicas
  • Buffer depth
  • Scaling decisions per minute (by direction)
  • Sidecar in-flight (per pod)

OpenTelemetry / tracing

Not in MVP. Will be added as a follow-up — the metric series above plus structured logs (agent_name, invocation_id, trace_id) cover the 80% case for now.