Skip to content

Observability (Step 4)

This project now emits structured JSON logs for runtime orchestration and high-risk baseline services.

Goals

  • Trace each execution with stable event names.
  • Measure latency (duration_ms) at runtime, capability, and service levels.
  • Capture success/failure metadata without changing capability contracts.

Logger Configuration

  • Logger name: agent_skills
  • Format: single-line JSON per event
  • Timestamp field: ts (UTC, ISO-like)
  • Level env var: AGENT_SKILLS_LOG_LEVEL (default: INFO)
  • Correlation env var support via runtime context: trace_id (auto-generated if not provided)
  • Max string length in logs: AGENT_SKILLS_LOG_MAX_STR_LEN (default: 512)
  • Max collection items in logs: AGENT_SKILLS_LOG_MAX_ITEMS (default: 50)

Runtime Events

Emitted from the execution pipeline:

  • skill.execute.start
  • skill.execute.completed
  • skill.execute.failed
  • step.execute.start
  • step.execute.completed
  • step.execute.failed
  • capability.execute.start
  • capability.execute.completed
  • capability.execute.failed

Common fields:

  • trace_id
  • skill_id
  • step_id
  • capability_id
  • binding_id
  • service_id
  • duration_ms
  • error_type
  • error_message

Parallel Step Events

When the DAG scheduler executes steps in parallel (see docs/SCHEDULER.md), step.execute.start / step.execute.completed events from concurrent steps may interleave in the log stream. Correlation is by step_id within a given trace_id. The skill.execute.completed event is emitted only after all steps finish.

Service Events

Instrumented baseline services:

  • code.snippet.execute via service.code.snippet.execute.start and service.code.snippet.execute
  • web.page.fetch via service.web.page.fetch.start and service.web.page.fetch
  • pdf.document.read via service.pdf.document.read.start and service.pdf.document.read
  • audio.speech.transcribe via service.audio.speech.transcribe.start and service.audio.speech.transcribe

Each service event includes status (completed, rejected, failed) and latency.

When service logs are emitted inside runtime execution, trace_id is propagated automatically through context.

Example Log

{"ts":"2026-03-11T10:21:07Z","event":"service.web.page.fetch","status":"completed","http_status":200,"scheme":"https","host":"www.google.com","duration_ms":595.717}

Notes

  • Logging does not alter return payloads for any capability.
  • Optional fields are sanitized to JSON-safe values.
  • Sensitive fields are redacted based on key names (for example: token, password, authorization, api_key, secret, cookie).
  • To reduce verbosity in CI/local runs, set AGENT_SKILLS_LOG_LEVEL=WARNING.

Skill Execution Audit Records

In addition to runtime logs, skill execution now supports persisted audit records written as JSONL.

Default path:

  • artifacts/runtime_skill_audit.jsonl

Override path:

  • AGENT_SKILLS_AUDIT_PATH

Default mode:

  • AGENT_SKILLS_AUDIT_DEFAULT_MODE (standard by default)

Supported per-execution modes:

  • off: no persisted audit record for the execution
  • standard: metadata + hashes only (lightweight)
  • full: includes redacted payload snapshots per run and per step

Audit records include:

  • run metadata (trace_id, skill_id, status, channel, duration)
  • per-step metadata (step_id, uses, status, duration, binding/service IDs)
  • fallback and conformance metadata when available
  • hash references for inputs/outputs (sha256:*)

Sensitive values are redacted by key-name policy similar to runtime logs.

CLI examples:

  • python cli/main.py run text.simple-summarize --audit-mode standard
  • python cli/main.py trace text.simple-summarize --audit-mode full
  • python cli/main.py audit-purge --older-than-days 30
  • python cli/main.py audit-purge --trace-id <trace-id>
  • python cli/main.py audit-purge --all

End-to-End Validation Checklist

Use this checklist to validate that audit persistence is operational and mode-aware.

  1. Reset previous audit records:
python cli/main.py audit-purge --all
  1. Run a multi-step skill in standard mode:
python cli/main.py trace text.detect-language-and-classify --input-file artifacts/e2e_input_message.json --audit-mode standard

Expected:

  • Skill completes successfully.
  • A new JSONL record is written.
  • Record contains input_hash/output_hash and per-step hashes.
  • Record does not include full inputs, outputs, or per-step payload snapshots.

  • Run the same skill in full mode:

python cli/main.py run text.detect-language-and-classify --input-file artifacts/e2e_input_message.json --audit-mode full

Expected:

  • Skill completes successfully.
  • A second JSONL record is written.
  • Record includes the same hashes plus redacted payload snapshots:

    • top-level inputs and outputs
    • per-step resolved_input and produced_output
  • Verify audit file contents:

Get-Content artifacts/runtime_skill_audit.jsonl
  1. Validate user-managed deletion:

OpenTelemetry (Optional)

agent-skills ships an optional OTel integration that emits distributed tracing spans without requiring a hard dependency on the SDK.

Installation

pip install "orca-agent-skills[otel]"
# or individually:
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp

When the opentelemetry-api package is importable the engine automatically emits spans. When it is absent, every helper degrades to a no-op — zero overhead, no code guards needed.

Spans Emitted

Span Name Scope Key Attributes
skill.execute Per-skill skill.id, skill.trace_id, skill.depth
step.execute Per-step step.id, step.uses, skill.id

Errors are recorded on the span via record_exception + StatusCode.ERROR.

Configuration

Set standard OTel environment variables:

Variable Example
OTEL_EXPORTER_OTLP_ENDPOINT http://localhost:4317
OTEL_SERVICE_NAME agent-skills
OTEL_RESOURCE_ATTRIBUTES deployment.environment=production

For OTLP/gRPC export with the SDK auto-configuration:

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_SERVICE_NAME=agent-skills
agent-skills serve

Programmatic Setup

If you need custom configuration (e.g. batch span processor, custom resource), configure OTel before calling the engine:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(provider)

# Now run agent-skills — spans flow automatically

Module Reference

runtime.otel_integration exposes:

  • otel_available() → bool — check if the SDK is installed
  • get_tracer() — returns a tracer or None
  • start_span(name, attributes) — context manager yielding a span (or _NoopSpan)
  • record_exception(span, exc) — record + set ERROR status
  • @traced(name) — decorator wrapping a function in a span
python cli/main.py audit-purge --trace-id <trace-id>

Expected:

  • Purge response reports deleted > 0 for matching records.
  • Remaining records are preserved.

CLI Trace Correlation

You can provide a correlation id from CLI:

  • python cli/main.py run <skill_id> --trace-id my-trace-001
  • python cli/main.py trace <skill_id> --trace-id my-trace-001

CognitiveState v1 Execution Trace

In addition to runtime events and structured logs, CognitiveState v1 adds a typed trace structure directly in ExecutionState.trace.

TraceState

Available after execution at state.trace:

state.trace.steps    # list[TraceStep] — one per executed step
state.trace.metrics  # TraceMetrics — live aggregate counters

TraceStep (per step)

Each step generates a frozen trace entry with data lineage:

Field Type Description
step_id str Step identifier
capability_id str Capability invoked
status str completed or failed
started_at datetime Step start timestamp
ended_at datetime Step end timestamp
reads tuple[str] References resolved during input mapping
writes tuple[str] Targets written during output mapping
latency_ms int Wall-clock time in milliseconds

Reads and writes enable dependency analysis, impact analysis when a step fails, and audit of data flow through capabilities.

TraceMetrics (aggregate)

Updated after each step completes:

Field Type Description
step_count int Total steps executed
llm_calls int LLM invocations (extracted from step meta)
tool_calls int Tool invocations
tokens_in int Input tokens consumed
tokens_out int Output tokens generated
elapsed_ms int Total accumulated wall-clock time

Relationship to Existing Observability

The three observability surfaces are complementary:

Surface Location Audience Granularity
Runtime events state.events skill authors lifecycle events
Structured logs JSON stdout ops/SRE log lines with trace_id
CognitiveState trace state.trace callers / analysis tools typed per-step lineage

Full reference: docs/COGNITIVE_STATE_V1.md