Skip to content

Observability (Step 4)

This project now emits structured JSON logs for runtime orchestration and high-risk baseline services.

Goals

  • Trace each execution with stable event names.
  • Measure latency (duration_ms) at runtime, capability, and service levels.
  • Capture success/failure metadata without changing capability contracts.

Logger Configuration

  • Logger name: agent_skills
  • Format: single-line JSON per event
  • Timestamp field: ts (UTC, ISO-like)
  • Level env var: AGENT_SKILLS_LOG_LEVEL (default: INFO)
  • Correlation env var support via runtime context: trace_id (auto-generated if not provided)
  • Max string length in logs: AGENT_SKILLS_LOG_MAX_STR_LEN (default: 512)
  • Max collection items in logs: AGENT_SKILLS_LOG_MAX_ITEMS (default: 50)

Runtime Events

Emitted from the execution pipeline:

  • skill.execute.start
  • skill.execute.completed
  • skill.execute.failed
  • step.execute.start
  • step.execute.completed
  • step.execute.failed
  • capability.execute.start
  • capability.execute.completed
  • capability.execute.failed
  • idempotency.request.created
  • idempotency.request.reused
  • idempotency.request.conflict
  • idempotency.keys.expired

Common fields:

  • trace_id
  • skill_id
  • step_id
  • capability_id
  • binding_id
  • service_id
  • duration_ms
  • error_type
  • error_message

Idempotency Telemetry

The async run launch path emits idempotency observability via both structured events and runtime counters.

Runtime counters (/v1/metrics):

  • runtime.idempotency.created
  • runtime.idempotency.reused
  • runtime.idempotency.conflict
  • runtime.idempotency.expired

Counter semantics:

  • created: a new async run was created with a non-empty idempotency key.
  • reused: an existing run was returned for the same key + same request fingerprint.
  • conflict: the key matched an existing run but fingerprint changed (HTTP 409).
  • expired: idempotency keys removed due to TTL expiration during prune/lookup.

Grafana Dashboard and Alerts

Repository artifacts:

  • Dashboard: docs/grafana/agent_skills_dashboard.json
  • Alert rules: docs/grafana/agent_skills_idempotency_alerts.yaml

Idempotency panels included in the dashboard:

  • Idempotency Created (total)
  • Idempotency Reused (total)
  • Idempotency Conflict Rate (5m)
  • Idempotency Expiration Ratio (30m)
  • Idempotency Event Rates (5m)

Suggested alert thresholds encoded in alert rules:

  • Warning: idempotency conflict rate > 1% for 10m.
  • Critical: idempotency conflict rate > 3% for 10m.
  • Warning: expiration ratio > 20% for 30m.

Import workflow:

  1. Import dashboard JSON into Grafana.
  2. Update datasource UID values in alert rules if your Prometheus UID differs.
  3. Import alert rules using Grafana provisioning or API.
  4. Validate live values against /v1/metrics/prometheus before enabling notifications.

Parallel Step Events

When the DAG scheduler executes steps in parallel (see docs/SCHEDULER.md), step.execute.start / step.execute.completed events from concurrent steps may interleave in the log stream. Correlation is by step_id within a given trace_id. The skill.execute.completed event is emitted only after all steps finish.

Service Events

Instrumented baseline services:

  • code.snippet.execute via service.code.snippet.execute.start and service.code.snippet.execute
  • web.page.fetch via service.web.page.fetch.start and service.web.page.fetch
  • pdf.document.read via service.pdf.document.read.start and service.pdf.document.read
  • audio.speech.transcribe via service.audio.speech.transcribe.start and service.audio.speech.transcribe

Each service event includes status (completed, rejected, failed) and latency.

When service logs are emitted inside runtime execution, trace_id is propagated automatically through context.

Example Log

{"ts":"2026-03-11T10:21:07Z","event":"service.web.page.fetch","status":"completed","http_status":200,"scheme":"https","host":"www.google.com","duration_ms":595.717}

Notes

  • Logging does not alter return payloads for any capability.
  • Optional fields are sanitized to JSON-safe values.
  • Sensitive fields are redacted based on key names (for example: token, password, authorization, api_key, secret, cookie).
  • To reduce verbosity in CI/local runs, set AGENT_SKILLS_LOG_LEVEL=WARNING.

Skill Execution Audit Records

In addition to runtime logs, skill execution now supports persisted audit records written as JSONL.

Default path:

  • artifacts/runtime_skill_audit.jsonl

Override path:

  • AGENT_SKILLS_AUDIT_PATH

Default mode:

  • AGENT_SKILLS_AUDIT_DEFAULT_MODE (standard by default)

Supported per-execution modes:

  • off: no persisted audit record for the execution
  • standard: metadata + hashes only (lightweight)
  • full: includes redacted payload snapshots per run and per step

Audit records include:

  • run metadata (trace_id, skill_id, status, channel, duration)
  • per-step metadata (step_id, uses, status, duration, binding/service IDs)
  • fallback and conformance metadata when available
  • hash references for inputs/outputs (sha256:*)

Sensitive values are redacted by key-name policy similar to runtime logs.

CLI examples:

  • python cli/main.py run text.simple-summarize --audit-mode standard
  • python cli/main.py trace text.simple-summarize --audit-mode full
  • python cli/main.py audit-purge --older-than-days 30
  • python cli/main.py audit-purge --trace-id <trace-id>
  • python cli/main.py audit-purge --all

End-to-End Validation Checklist

Use this checklist to validate that audit persistence is operational and mode-aware.

  1. Reset previous audit records:
python cli/main.py audit-purge --all
  1. Run a multi-step skill in standard mode:
python cli/main.py trace text.detect-language-and-classify --input-file artifacts/e2e_input_message.json --audit-mode standard

Expected:

  • Skill completes successfully.
  • A new JSONL record is written.
  • Record contains input_hash/output_hash and per-step hashes.
  • Record does not include full inputs, outputs, or per-step payload snapshots.

  • Run the same skill in full mode:

python cli/main.py run text.detect-language-and-classify --input-file artifacts/e2e_input_message.json --audit-mode full

Expected:

  • Skill completes successfully.
  • A second JSONL record is written.
  • Record includes the same hashes plus redacted payload snapshots:

    • top-level inputs and outputs
    • per-step resolved_input and produced_output
  • Verify audit file contents:

Get-Content artifacts/runtime_skill_audit.jsonl
  1. Validate user-managed deletion:

OpenTelemetry (Optional)

agent-skills ships an optional OTel integration that emits distributed tracing spans without requiring a hard dependency on the SDK.

Installation

pip install "orca-agent-skills[otel]"
# or individually:
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp

When the opentelemetry-api package is importable the engine automatically emits spans. When it is absent, every helper degrades to a no-op — zero overhead, no code guards needed.

Spans Emitted

Span Name Scope Key Attributes
skill.execute Per-skill skill.id, skill.trace_id, skill.depth
step.execute Per-step step.id, step.uses, skill.id

Errors are recorded on the span via record_exception + StatusCode.ERROR.

Configuration

Set standard OTel environment variables:

Variable Example
OTEL_EXPORTER_OTLP_ENDPOINT http://localhost:4317
OTEL_SERVICE_NAME agent-skills
OTEL_RESOURCE_ATTRIBUTES deployment.environment=production

For OTLP/gRPC export with the SDK auto-configuration:

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_SERVICE_NAME=agent-skills
agent-skills serve

Programmatic Setup

If you need custom configuration (e.g. batch span processor, custom resource), configure OTel before calling the engine:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(provider)

# Now run agent-skills — spans flow automatically

Module Reference

runtime.otel_integration exposes:

  • otel_available() → bool — check if the SDK is installed
  • get_tracer() — returns a tracer or None
  • start_span(name, attributes) — context manager yielding a span (or _NoopSpan)
  • record_exception(span, exc) — record + set ERROR status
  • @traced(name) — decorator wrapping a function in a span
python cli/main.py audit-purge --trace-id <trace-id>

Expected:

  • Purge response reports deleted > 0 for matching records.
  • Remaining records are preserved.

CLI Trace Correlation

You can provide a correlation id from CLI:

  • python cli/main.py run <skill_id> --trace-id my-trace-001
  • python cli/main.py trace <skill_id> --trace-id my-trace-001

CognitiveState v1 Execution Trace

In addition to runtime events and structured logs, CognitiveState v1 adds a typed trace structure directly in ExecutionState.trace.

TraceState

Available after execution at state.trace:

state.trace.steps    # list[TraceStep] — one per executed step
state.trace.metrics  # TraceMetrics — live aggregate counters

TraceStep (per step)

Each step generates a frozen trace entry with data lineage:

Field Type Description
step_id str Step identifier
capability_id str Capability invoked
status str completed or failed
started_at datetime Step start timestamp
ended_at datetime Step end timestamp
reads tuple[str] References resolved during input mapping
writes tuple[str] Targets written during output mapping
latency_ms int Wall-clock time in milliseconds

Reads and writes enable dependency analysis, impact analysis when a step fails, and audit of data flow through capabilities.

TraceMetrics (aggregate)

Updated after each step completes:

Field Type Description
step_count int Total steps executed
llm_calls int LLM invocations (extracted from step meta)
tool_calls int Tool invocations
tokens_in int Input tokens consumed
tokens_out int Output tokens generated
elapsed_ms int Total accumulated wall-clock time

Relationship to Existing Observability

The three observability surfaces are complementary:

Surface Location Audience Granularity
Runtime events state.events skill authors lifecycle events
Structured logs JSON stdout ops/SRE log lines with trace_id
CognitiveState trace state.trace callers / analysis tools typed per-step lineage

Full reference: docs/COGNITIVE_STATE_V1.md