Runner Guide¶
This document explains how the runtime runner works, from skill input to final outputs.
1) What the Runner Is¶
The runner is the execution subsystem in runtime/ that:
- loads skill and capability definitions
- builds an execution plan
- resolves and executes each step
- maps capability/service responses back into skill outputs
- emits execution events and structured logs
Primary entrypoint for manual usage is cli/main.py.
2) End-to-End Flow¶
Execution path (high-level):
- CLI builds ExecutionRequest
- ExecutionEngine loads skill and initializes ExecutionState
- Scheduler builds DAG from step
config.depends_ondeclarations - Scheduler dispatches steps (parallel when dependencies allow, sequential by default):
- InputMapper resolves step input from inputs/vars/outputs refs
- If uses starts with skill:, NestedSkillRunner executes recursively
- Else CapabilityExecutor delegates to BindingExecutor
- BindingExecutor pipeline:
- BindingResolver selects binding
- ServiceResolver resolves service descriptor
- RequestBuilder builds protocol payload from input.* template
- ProtocolRouter dispatches to protocol invoker
- ResponseMapper maps response.* into capability outputs
- OutputMapper writes step outputs to vars. and outputs.
- ExecutionEngine validates required final outputs
- SkillExecutionResult is returned
3) Core Runtime Modules¶
State and model layer:
- runtime/models.py: typed runtime data contracts (includes CognitiveState v1 structures)
- runtime/execution_state.py: mutable execution state + runtime events
- runtime/errors.py: typed runtime exceptions
Planning and orchestration:
- runtime/skill_loader.py: loads and normalizes skill specs
- runtime/capability_loader.py: loads and normalizes capability specs
- runtime/execution_planner.py: prepares step order
- runtime/scheduler.py: DAG-based step scheduler (parallel/sequential)
- runtime/execution_engine.py: orchestrates whole run
- runtime/nested_skill_runner.py: executes skill:
steps
Step input/output mapping:
- runtime/reference_resolver.py: resolves data references (7 namespaces with path traversal)
- runtime/input_mapper.py: materializes step input
- runtime/output_mapper.py: writes produced values to runtime targets (5 writable namespaces, 4 merge strategies)
Binding execution layer:
- runtime/binding_registry.py: loads services, bindings, defaults
- runtime/active_binding_map.py: active override map
- runtime/binding_resolver.py: chooses effective binding
- runtime/service_resolver.py: resolves service descriptor
- runtime/request_builder.py: builds invocation payload
- runtime/protocol_router.py: routes by protocol kind
- runtime/response_mapper.py: maps invocation response to capability output
- runtime/binding_executor.py: full binding execution pipeline
- runtime/capability_executor.py: runtime adapter around binding executor
Protocol invokers:
- runtime/pythoncall_invoker.py
- runtime/openapi_invoker.py
- runtime/openrpc_invoker.py
- runtime/mcp_invoker.py
OpenAPI invoker runtime knobs (metadata-driven):
- binding metadata:
method(defaultPOST)timeout_seconds(overrides service/default timeout)headers(string-to-string map merged over service headers)response_mode(jsondefault,text, orraw)- service metadata:
timeout_seconds(used when binding does not override)headers(base header map)
Observability:
- runtime/observability.py: structured logs, trace context, redaction
4) CognitiveState v1¶
ExecutionState now includes four cognitive blocks (frame, working, output, trace) plus an extensions namespace, enabling structured multi-step reasoning without breaking the legacy vars/outputs pipeline.
Key additions:
- FrameState: immutable reasoning context (goal, constraints, success_criteria)
- WorkingState: mutable working memory with 10 typed cognitive slots
- OutputState: structured result metadata (result_type, summary, status_reason)
- TraceState: per-step data lineage (reads/writes) and live aggregate metrics
- extensions: open namespace for plugins
Reference resolution now supports 7 namespaces with path traversal through dataclass attributes, dict keys, and list indices:
inputs.*, vars.*, outputs.*, frame.*, working.*, output.*, extensions.*
Output mapping supports 5 writable namespaces with 4 merge strategies (overwrite, append, deep_merge, replace).
All features are backward-compatible. Existing skills are unaffected.
Full reference: docs/COGNITIVE_STATE_V1.md
5) Trace and Events¶
Two complementary tracing surfaces exist:
- Runtime events (in ExecutionState.events)
-
event type/message/timestamp/step_id/trace_id/data
-
Structured logs (JSON lines)
- skill/step/capability lifecycle
- service lifecycle for critical services
-
correlation through trace_id
-
CognitiveState v1 trace (in ExecutionState.trace)
- TraceStep per step: step_id, capability_id, status, reads, writes, latency_ms
- TraceMetrics aggregate: step_count, llm_calls, tool_calls, tokens_in/out, elapsed_ms
- Automatically populated by the execution engine
Trace propagation:
- trace_id can be passed in ExecutionRequest
- CLI supports --trace-id in run and trace commands
- nested skills inherit parent trace_id
6) How to Run¶
Basic execution:
- python cli/main.py run
- python cli/main.py run
--input "{\"key\":\"value\"}" - python cli/main.py run
--input-file input.json
Execution with trace correlation:
- python cli/main.py run
--trace-id trace-001 - python cli/main.py trace
--trace-id trace-001
System checks:
- python cli/main.py doctor
OpenAPI checks from CLI:
- python cli/main.py openapi verify-bindings --all
- python cli/main.py openapi verify-bindings --scenario tooling/openapi_scenarios/data.schema.validate.mock.json
- python cli/main.py openapi verify-invoker
- python cli/main.py openapi verify-errors
7) Validation and Health Commands¶
Contracts:
- python tooling/test_capability_contracts.py
Smoke:
- python tooling/verify_smoke_capabilities.py --report-file artifacts/smoke_report.json
Coverage and consistency:
- python tooling/compute_runtime_coverage.py
- python tooling/compute_runtime_stats.py
- python tooling/compute_skill_executability.py
Registry side:
- python ../agent-skill-registry/tools/validate_registry.py
- python ../agent-skill-registry/tools/generate_catalog.py
- python ../agent-skill-registry/tools/registry_stats.py
8) Failure Model (Practical)¶
Common failure categories:
- Input mapping errors: missing input.* fields required by a step
- Binding resolution errors: no binding or invalid default selection
- Service resolution errors: binding points to missing/invalid service
- Request/response mapping errors: template points to missing fields
- Protocol routing/invocation errors: unsupported or failing protocol path
- Final output validation errors: required skill outputs not produced
Debug order that works well:
- Re-run with trace command and fixed trace_id
- Inspect structured logs filtered by trace_id
- Inspect failing step mapping in skill yaml
- Inspect binding request/response templates
- Confirm service implementation output shape
9) Configuration Surfaces¶
Repository-level:
- bindings/official/
- services/official/
- policies/official_default_selection.yaml
Host-level overrides (.agent-skills):
- services.yaml
- bindings/local/
- bindings/candidate/
- active_bindings.json
- overrides.yaml
10) Design Constraints¶
Current runner behavior intentionally keeps:
- DAG-based step scheduling with backward-compatible implicit sequential deps
- explicit
depends_on: []to opt into parallel execution - explicit mapping instead of implicit field matching
- strict response mapping (missing fields fail fast)
- protocol abstraction via invoker routing
- thread-safe state mutations via _StateLock during parallel execution
See docs/SCHEDULER.md for full scheduler documentation.
11) Current Baseline¶
As documented in docs/PROJECT_STATUS.md, runner baseline is currently stable with:
- 45/45 contract pass
- 8/8 smoke pass
- full capability coverage and skill executability (45/45 capabilities, 36/36 skills)
- DAG scheduler functional tests: 5/5
- DAG scheduler stress tests: 5/5
- CognitiveState v1 regression tests: 86/86
- CognitiveState v1 integration tests: 99/99
This is the recommended baseline before starting MCP/OpenAPI adapter expansion.
12) Safety Enforcement¶
Capabilities that declare a safety block are subject to runtime enforcement
during step execution. The enforcement runs inside _execute_step() and is
transparent to the rest of the pipeline.
Enforcement sequence¶
Before capability execution:
-
Trust-level check — The capability's
safety.trust_levelrank is compared againstExecutionOptions.trust_level. If the context rank is lower, aSafetyTrustLevelErroris raised. Ranks: sandbox=0, standard=1, elevated=2, privileged=3. -
Confirmation check — If
safety.requires_confirmationis true and the capability id is not inExecutionOptions.confirmed_capabilities, aSafetyConfirmationRequiredErroris raised. -
Mandatory pre-gates — Each gate in
safety.mandatory_pre_gatesis executed as a capability. If the gate returns{"allowed": false}, theon_failpolicy determines behavior: block— raisesSafetyGateFailedError(default)warn— emits asafety_gate_warningevent and continuesdegrade— returns adegradedStepResult (step skipped)require_human— raisesSafetyConfirmationRequiredError
After capability execution:
- Mandatory post-gates — Same mechanism as pre-gates, but receives the produced output as input.
Capabilities without safety¶
Capabilities that do not declare a safety block are executed without any
additional checks. The enforcement is zero-cost for the ~80% of capabilities
that have no side effects.
ExecutionOptions safety fields¶
trust_level: str— Runtime trust level for the execution context (default:"standard").confirmed_capabilities: frozenset[str]— Set of capability ids pre-confirmed by the caller, bypassingrequires_confirmation.
Error types¶
SafetyTrustLevelError— trust level insufficientSafetyGateFailedError— a mandatory gate blocked executionSafetyConfirmationRequiredError— human confirmation required
All three inherit from RuntimeErrorBase and carry capability_id and
step_id context.