Observability (Step 4)¶

This project now emits structured JSON logs for runtime orchestration and high-risk baseline services.

Goals¶

Trace each execution with stable event names.
Measure latency (duration_ms) at runtime, capability, and service levels.
Capture success/failure metadata without changing capability contracts.

Logger Configuration¶

Logger name: agent_skills
Format: single-line JSON per event
Timestamp field: ts (UTC, ISO-like)
Level env var: AGENT_SKILLS_LOG_LEVEL (default: INFO)
Correlation env var support via runtime context: trace_id (auto-generated if not provided)
Max string length in logs: AGENT_SKILLS_LOG_MAX_STR_LEN (default: 512)
Max collection items in logs: AGENT_SKILLS_LOG_MAX_ITEMS (default: 50)

Runtime Events¶

Emitted from the execution pipeline:

skill.execute.start
skill.execute.completed
skill.execute.failed
step.execute.start
step.execute.completed
step.execute.failed
capability.execute.start
capability.execute.completed
capability.execute.failed

Common fields:

trace_id
skill_id
step_id
capability_id
binding_id
service_id
duration_ms
error_type
error_message

Parallel Step Events¶

When the DAG scheduler executes steps in parallel (see docs/SCHEDULER.md), step.execute.start / step.execute.completed events from concurrent steps may interleave in the log stream. Correlation is by step_id within a given trace_id. The skill.execute.completed event is emitted only after all steps finish.

Service Events¶

Instrumented baseline services:

code.snippet.execute via service.code.snippet.execute.start and service.code.snippet.execute
web.page.fetch via service.web.page.fetch.start and service.web.page.fetch
pdf.document.read via service.pdf.document.read.start and service.pdf.document.read
audio.speech.transcribe via service.audio.speech.transcribe.start and service.audio.speech.transcribe

Each service event includes status (completed, rejected, failed) and latency.

When service logs are emitted inside runtime execution, trace_id is propagated automatically through context.

Example Log¶

{"ts":"2026-03-11T10:21:07Z","event":"service.web.page.fetch","status":"completed","http_status":200,"scheme":"https","host":"www.google.com","duration_ms":595.717}

Notes¶

Logging does not alter return payloads for any capability.
Optional fields are sanitized to JSON-safe values.
Sensitive fields are redacted based on key names (for example: token, password, authorization, api_key, secret, cookie).
To reduce verbosity in CI/local runs, set AGENT_SKILLS_LOG_LEVEL=WARNING.

Skill Execution Audit Records¶

In addition to runtime logs, skill execution now supports persisted audit records written as JSONL.

Default path:

artifacts/runtime_skill_audit.jsonl

Override path:

AGENT_SKILLS_AUDIT_PATH

Default mode:

AGENT_SKILLS_AUDIT_DEFAULT_MODE (standard by default)

Supported per-execution modes:

off: no persisted audit record for the execution
standard: metadata + hashes only (lightweight)
full: includes redacted payload snapshots per run and per step

Audit records include:

run metadata (trace_id, skill_id, status, channel, duration)
per-step metadata (step_id, uses, status, duration, binding/service IDs)
fallback and conformance metadata when available
hash references for inputs/outputs (sha256:*)

Sensitive values are redacted by key-name policy similar to runtime logs.

CLI examples:

python cli/main.py run text.simple-summarize --audit-mode standard
python cli/main.py trace text.simple-summarize --audit-mode full
python cli/main.py audit-purge --older-than-days 30
python cli/main.py audit-purge --trace-id <trace-id>
python cli/main.py audit-purge --all

End-to-End Validation Checklist¶

Use this checklist to validate that audit persistence is operational and mode-aware.

Reset previous audit records:

python cli/main.py audit-purge --all

Run a multi-step skill in standard mode:

python cli/main.py trace text.detect-language-and-classify --input-file artifacts/e2e_input_message.json --audit-mode standard

Expected:

Skill completes successfully.
A new JSONL record is written.
Record contains input_hash/output_hash and per-step hashes.
Record does not include full inputs, outputs, or per-step payload snapshots.
Run the same skill in full mode:

python cli/main.py run text.detect-language-and-classify --input-file artifacts/e2e_input_message.json --audit-mode full

Expected:

Skill completes successfully.
A second JSONL record is written.
Record includes the same hashes plus redacted payload snapshots:
- top-level inputs and outputs
- per-step resolved_input and produced_output
Verify audit file contents:

Get-Content artifacts/runtime_skill_audit.jsonl

Validate user-managed deletion:

OpenTelemetry (Optional)¶

agent-skills ships an optional OTel integration that emits distributed tracing spans without requiring a hard dependency on the SDK.

Installation¶

pip install "orca-agent-skills[otel]"
# or individually:
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp

When the opentelemetry-api package is importable the engine automatically emits spans. When it is absent, every helper degrades to a no-op — zero overhead, no code guards needed.

Spans Emitted¶

Span Name	Scope	Key Attributes
`skill.execute`	Per-skill	`skill.id`, `skill.trace_id`, `skill.depth`
`step.execute`	Per-step	`step.id`, `step.uses`, `skill.id`

Errors are recorded on the span via record_exception + StatusCode.ERROR.

Configuration¶

Set standard OTel environment variables:

Variable	Example
`OTEL_EXPORTER_OTLP_ENDPOINT`	`http://localhost:4317`
`OTEL_SERVICE_NAME`	`agent-skills`
`OTEL_RESOURCE_ATTRIBUTES`	`deployment.environment=production`

For OTLP/gRPC export with the SDK auto-configuration:

export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_SERVICE_NAME=agent-skills
agent-skills serve

Programmatic Setup¶

If you need custom configuration (e.g. batch span processor, custom resource), configure OTel before calling the engine:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter

provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(provider)

# Now run agent-skills — spans flow automatically

Module Reference¶

runtime.otel_integration exposes:

otel_available() → bool — check if the SDK is installed
get_tracer() — returns a tracer or None
start_span(name, attributes) — context manager yielding a span (or _NoopSpan)
record_exception(span, exc) — record + set ERROR status
@traced(name) — decorator wrapping a function in a span

python cli/main.py audit-purge --trace-id <trace-id>

Expected:

Purge response reports deleted > 0 for matching records.
Remaining records are preserved.

CLI Trace Correlation¶

You can provide a correlation id from CLI:

python cli/main.py run <skill_id> --trace-id my-trace-001
python cli/main.py trace <skill_id> --trace-id my-trace-001

CognitiveState v1 Execution Trace¶

In addition to runtime events and structured logs, CognitiveState v1 adds a typed trace structure directly in ExecutionState.trace.

TraceState¶

Available after execution at state.trace:

state.trace.steps    # list[TraceStep] — one per executed step
state.trace.metrics  # TraceMetrics — live aggregate counters

TraceStep (per step)¶

Each step generates a frozen trace entry with data lineage:

Field	Type	Description
step_id	str	Step identifier
capability_id	str	Capability invoked
status	str	`completed` or `failed`
started_at	datetime	Step start timestamp
ended_at	datetime	Step end timestamp
reads	tuple[str]	References resolved during input mapping
writes	tuple[str]	Targets written during output mapping
latency_ms	int	Wall-clock time in milliseconds

Reads and writes enable dependency analysis, impact analysis when a step fails, and audit of data flow through capabilities.

TraceMetrics (aggregate)¶

Updated after each step completes:

Field	Type	Description
step_count	int	Total steps executed
llm_calls	int	LLM invocations (extracted from step meta)
tool_calls	int	Tool invocations
tokens_in	int	Input tokens consumed
tokens_out	int	Output tokens generated
elapsed_ms	int	Total accumulated wall-clock time

Relationship to Existing Observability¶

The three observability surfaces are complementary:

Surface	Location	Audience	Granularity
Runtime events	state.events	skill authors	lifecycle events
Structured logs	JSON stdout	ops/SRE	log lines with trace_id
CognitiveState trace	state.trace	callers / analysis tools	typed per-step lineage

Full reference: docs/COGNITIVE_STATE_V1.md