SLO / SLI — Service Level Objectives & Indicators¶

Per-capability latency and error budgets for agent-skills.

Definitions¶

Term	Meaning
SLI (Service Level Indicator)	A measurable metric — e.g., p95 latency, error rate
SLO (Service Level Objective)	A target for an SLI — e.g., p95 < 200 ms
Error Budget	Allowed failure rate — e.g., 0.1% of executions may fail

Default SLOs¶

These defaults apply to all capabilities unless overridden per-domain.

SLI	Target	Measurement
p50 latency	< 100 ms	Per-capability `execute()` wall clock
p95 latency	< 500 ms	Per-capability `execute()` wall clock
p99 latency	< 2000 ms	Per-capability `execute()` wall clock
Error rate	< 1%	Non-timeout failures / total executions
Availability	> 99.5%	Successful responses / total requests

Per-Domain SLO Overrides¶

Domains that involve external APIs or heavier computation have relaxed targets:

Domain	p95 Latency	p99 Latency	Error Rate	Notes
`text.*` (baseline)	100 ms	500 ms	< 0.5%	Pure Python, deterministic
`text.*` (OpenAI)	2000 ms	5000 ms	< 2%	Network-bound
`data.*`	100 ms	500 ms	< 0.5%	Pure Python
`code.*`	200 ms	1000 ms	< 1%	May involve parsing
`model.*` (baseline)	100 ms	500 ms	< 0.5%	Local heuristics
`model.*` (OpenAI)	3000 ms	8000 ms	< 3%	LLM inference
`web.*`	2000 ms	5000 ms	< 5%	Network-bound, external sites
`audio.*`	5000 ms	15000 ms	< 5%	Large payloads
`agent.*`	500 ms	2000 ms	< 2%	May trigger sub-skills
`fs.*`	50 ms	200 ms	< 0.5%	Local filesystem
`email.*`	2000 ms	5000 ms	< 3%	SMTP/IMAP latency

Configuration¶

SLO targets can be configured per capability in the binding metadata:

# bindings/official/openapi_text_summarize.yaml
metadata:
  slo:
    p95_latency_ms: 2000
    p99_latency_ms: 5000
    error_rate_pct: 2.0

The runtime reads these at execution time and records violations as observability events.

SLI Collection¶

SLIs are collected automatically by the runtime:

RuntimeMetrics (runtime/metrics.py) — counters and histograms per capability
OTel spans (runtime/otel_integration.py) — per-step latency with record_exception
Audit trail (runtime/audit.py) — hash-chain execution records
Prometheus endpoint (GET /v1/metrics) — exposition format for scraping

Metrics Available¶

Metric	Type	Labels
`capability_execution_duration_ms`	Histogram	`capability_id`, `binding_protocol`
`capability_execution_total`	Counter	`capability_id`, `status`
`capability_execution_errors_total`	Counter	`capability_id`, `error_code`
`step_execution_duration_ms`	Histogram	`skill_id`, `step_id`

Async Idempotency SLI Set¶

For customer-facing async launch endpoints, track idempotency behavior as a first-class reliability signal.

Runtime counters (/v1/metrics):

runtime.idempotency.created
runtime.idempotency.reused
runtime.idempotency.conflict
runtime.idempotency.expired

Derived SLIs:

Idempotency conflict rate:
conflict_rate = conflict / max(created + reused + conflict, 1)
Suggested SLO: < 0.5% over rolling 30m for stable clients.
Retry dedup effectiveness:
dedup_rate = reused / max(created + reused, 1)
No universal target; baseline by client profile and retry policy.
Expiration pressure indicator:
expiration_ratio = expired / max(created + reused, 1)
Use as tuning input for AGENT_SKILLS_IDEMPOTENCY_TTL_SECONDS.

Suggested alert thresholds:

Warning: conflict_rate > 1% for 10m.
Critical: conflict_rate > 3% for 10m.
Warning: expiration_ratio > 20% for 30m (potentially short TTL vs retry horizon).

Prometheus query examples:

sum(rate(agent_skills_runtime_idempotency_conflict_total[10m]))
/
clamp_min(
  sum(rate(agent_skills_runtime_idempotency_created_total[10m]))
  + sum(rate(agent_skills_runtime_idempotency_reused_total[10m]))
  + sum(rate(agent_skills_runtime_idempotency_conflict_total[10m])),
  1
)

sum(rate(agent_skills_runtime_idempotency_expired_total[30m]))
/
clamp_min(
  sum(rate(agent_skills_runtime_idempotency_created_total[30m]))
  + sum(rate(agent_skills_runtime_idempotency_reused_total[30m])),
  1
)

SLO Violation Alerting¶

When a capability exceeds its SLO target, the runtime emits:

log_event("slo.violation", {
    "capability_id": "text.content.summarize",
    "sli": "p95_latency_ms",
    "target": 2000,
    "actual": 3500,
    "binding_protocol": "openapi",
})

This event is: - Logged at WARNING level - Included in the audit trail - Emitted as an OTel event (when enabled) - Available via the webhook system (slo.violation event type)

Benchmark Lab Integration¶

Use benchmark-lab to validate SLO compliance:

# Run 100 iterations and check against SLO targets
agent-skills benchmark-lab text.content.summarize --runs 100

# Compare baseline vs OpenAI binding latency
agent-skills benchmark-lab text.content.summarize --protocols pythoncall,openapi --runs 50

Review Cadence¶

SLO targets should be reviewed quarterly based on: - Production metrics (if deployed) - Benchmark lab results - New binding additions (new protocols may shift baselines) - User feedback on acceptable latency