Observability (Step 4)¶
This project now emits structured JSON logs for runtime orchestration and high-risk baseline services.
Goals¶
- Trace each execution with stable event names.
- Measure latency (
duration_ms) at runtime, capability, and service levels. - Capture success/failure metadata without changing capability contracts.
Logger Configuration¶
- Logger name:
agent_skills - Format: single-line JSON per event
- Timestamp field:
ts(UTC, ISO-like) - Level env var:
AGENT_SKILLS_LOG_LEVEL(default:INFO) - Correlation env var support via runtime context:
trace_id(auto-generated if not provided) - Max string length in logs:
AGENT_SKILLS_LOG_MAX_STR_LEN(default:512) - Max collection items in logs:
AGENT_SKILLS_LOG_MAX_ITEMS(default:50)
Runtime Events¶
Emitted from the execution pipeline:
skill.execute.startskill.execute.completedskill.execute.failedstep.execute.startstep.execute.completedstep.execute.failedcapability.execute.startcapability.execute.completedcapability.execute.failed
Common fields:
trace_idskill_idstep_idcapability_idbinding_idservice_idduration_mserror_typeerror_message
Parallel Step Events¶
When the DAG scheduler executes steps in parallel (see docs/SCHEDULER.md),
step.execute.start / step.execute.completed events from concurrent steps
may interleave in the log stream. Correlation is by step_id within a given
trace_id. The skill.execute.completed event is emitted only after all
steps finish.
Service Events¶
Instrumented baseline services:
code.snippet.executeviaservice.code.snippet.execute.startandservice.code.snippet.executeweb.page.fetchviaservice.web.page.fetch.startandservice.web.page.fetchpdf.document.readviaservice.pdf.document.read.startandservice.pdf.document.readaudio.speech.transcribeviaservice.audio.speech.transcribe.startandservice.audio.speech.transcribe
Each service event includes status (completed, rejected, failed) and latency.
When service logs are emitted inside runtime execution, trace_id is propagated automatically through context.
Example Log¶
{"ts":"2026-03-11T10:21:07Z","event":"service.web.page.fetch","status":"completed","http_status":200,"scheme":"https","host":"www.google.com","duration_ms":595.717}
Notes¶
- Logging does not alter return payloads for any capability.
- Optional fields are sanitized to JSON-safe values.
- Sensitive fields are redacted based on key names (for example:
token,password,authorization,api_key,secret,cookie). - To reduce verbosity in CI/local runs, set
AGENT_SKILLS_LOG_LEVEL=WARNING.
Skill Execution Audit Records¶
In addition to runtime logs, skill execution now supports persisted audit records written as JSONL.
Default path:
artifacts/runtime_skill_audit.jsonl
Override path:
AGENT_SKILLS_AUDIT_PATH
Default mode:
AGENT_SKILLS_AUDIT_DEFAULT_MODE(standardby default)
Supported per-execution modes:
off: no persisted audit record for the executionstandard: metadata + hashes only (lightweight)full: includes redacted payload snapshots per run and per step
Audit records include:
- run metadata (
trace_id,skill_id, status, channel, duration) - per-step metadata (
step_id,uses, status, duration, binding/service IDs) - fallback and conformance metadata when available
- hash references for inputs/outputs (
sha256:*)
Sensitive values are redacted by key-name policy similar to runtime logs.
CLI examples:
python cli/main.py run text.simple-summarize --audit-mode standardpython cli/main.py trace text.simple-summarize --audit-mode fullpython cli/main.py audit-purge --older-than-days 30python cli/main.py audit-purge --trace-id <trace-id>python cli/main.py audit-purge --all
End-to-End Validation Checklist¶
Use this checklist to validate that audit persistence is operational and mode-aware.
- Reset previous audit records:
python cli/main.py audit-purge --all
- Run a multi-step skill in
standardmode:
python cli/main.py trace text.detect-language-and-classify --input-file artifacts/e2e_input_message.json --audit-mode standard
Expected:
- Skill completes successfully.
- A new JSONL record is written.
- Record contains
input_hash/output_hashand per-step hashes. -
Record does not include full
inputs,outputs, or per-step payload snapshots. -
Run the same skill in
fullmode:
python cli/main.py run text.detect-language-and-classify --input-file artifacts/e2e_input_message.json --audit-mode full
Expected:
- Skill completes successfully.
- A second JSONL record is written.
-
Record includes the same hashes plus redacted payload snapshots:
- top-level
inputsandoutputs - per-step
resolved_inputandproduced_output
- top-level
-
Verify audit file contents:
Get-Content artifacts/runtime_skill_audit.jsonl
- Validate user-managed deletion:
OpenTelemetry (Optional)¶
agent-skills ships an optional OTel integration that emits distributed
tracing spans without requiring a hard dependency on the SDK.
Installation¶
pip install "orca-agent-skills[otel]"
# or individually:
pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
When the opentelemetry-api package is importable the engine automatically
emits spans. When it is absent, every helper degrades to a no-op — zero
overhead, no code guards needed.
Spans Emitted¶
| Span Name | Scope | Key Attributes |
|---|---|---|
skill.execute |
Per-skill | skill.id, skill.trace_id, skill.depth |
step.execute |
Per-step | step.id, step.uses, skill.id |
Errors are recorded on the span via record_exception + StatusCode.ERROR.
Configuration¶
Set standard OTel environment variables:
| Variable | Example |
|---|---|
OTEL_EXPORTER_OTLP_ENDPOINT |
http://localhost:4317 |
OTEL_SERVICE_NAME |
agent-skills |
OTEL_RESOURCE_ATTRIBUTES |
deployment.environment=production |
For OTLP/gRPC export with the SDK auto-configuration:
export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
export OTEL_SERVICE_NAME=agent-skills
agent-skills serve
Programmatic Setup¶
If you need custom configuration (e.g. batch span processor, custom resource), configure OTel before calling the engine:
from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
provider = TracerProvider()
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(provider)
# Now run agent-skills — spans flow automatically
Module Reference¶
runtime.otel_integration exposes:
otel_available() → bool— check if the SDK is installedget_tracer()— returns a tracer orNonestart_span(name, attributes)— context manager yielding a span (or_NoopSpan)record_exception(span, exc)— record + set ERROR status@traced(name)— decorator wrapping a function in a span
python cli/main.py audit-purge --trace-id <trace-id>
Expected:
- Purge response reports
deleted > 0for matching records. - Remaining records are preserved.
CLI Trace Correlation¶
You can provide a correlation id from CLI:
python cli/main.py run <skill_id> --trace-id my-trace-001python cli/main.py trace <skill_id> --trace-id my-trace-001
CognitiveState v1 Execution Trace¶
In addition to runtime events and structured logs, CognitiveState v1 adds a
typed trace structure directly in ExecutionState.trace.
TraceState¶
Available after execution at state.trace:
state.trace.steps # list[TraceStep] — one per executed step
state.trace.metrics # TraceMetrics — live aggregate counters
TraceStep (per step)¶
Each step generates a frozen trace entry with data lineage:
| Field | Type | Description |
|---|---|---|
| step_id | str | Step identifier |
| capability_id | str | Capability invoked |
| status | str | completed or failed |
| started_at | datetime | Step start timestamp |
| ended_at | datetime | Step end timestamp |
| reads | tuple[str] | References resolved during input mapping |
| writes | tuple[str] | Targets written during output mapping |
| latency_ms | int | Wall-clock time in milliseconds |
Reads and writes enable dependency analysis, impact analysis when a step fails, and audit of data flow through capabilities.
TraceMetrics (aggregate)¶
Updated after each step completes:
| Field | Type | Description |
|---|---|---|
| step_count | int | Total steps executed |
| llm_calls | int | LLM invocations (extracted from step meta) |
| tool_calls | int | Tool invocations |
| tokens_in | int | Input tokens consumed |
| tokens_out | int | Output tokens generated |
| elapsed_ms | int | Total accumulated wall-clock time |
Relationship to Existing Observability¶
The three observability surfaces are complementary:
| Surface | Location | Audience | Granularity |
|---|---|---|---|
| Runtime events | state.events | skill authors | lifecycle events |
| Structured logs | JSON stdout | ops/SRE | log lines with trace_id |
| CognitiveState trace | state.trace | callers / analysis tools | typed per-step lineage |
Full reference: docs/COGNITIVE_STATE_V1.md