Skill Governance Manifesto¶

Date: 2026-03-13 Status: Proposed baseline for implementation Scope: Product-level trust model for portable skills over abstract capabilities

1. Product Thesis¶

The product must guarantee two things at the same time:

Portability:
Skills are high-level workflows built on abstract capabilities.
Capability contracts are the source of truth in registry YAML.
Trust:
A skill should not be considered reliable only because it executes.
Reliability must be evidenced through internal validation and field behavior.

This manifesto defines how to keep portability while controlling quality uncertainty introduced by provider and binding flexibility.

2. Non-Negotiable Invariants¶

Source of truth for semantics:
Capabilities are defined in registry YAML and remain canonical.
Runtime does not execute from generated catalogs.
Separation of concerns:
Registry catalogs describe and index.
Operational quality artifacts evaluate behavior and trust.
Override safety model:
User overrides are allowed and encouraged.
Last-resort fallback must always be the official default binding.
Explainability:
Every effective execution path must be explainable to a human user.

3. Capability Leveling Model¶

Capabilities must be high-level enough for agent composition but narrow enough to minimize hallucination risk.

3.1 Risk classes¶

A. Deterministic operations (low epistemic risk) - Examples: data.schema.validate, data.json.parse, table.row.filter.

B. Retrieval operations (medium risk) - Examples: web.page.fetch, fs.file.read, pdf.document.read. - Variability comes from source quality and external system behavior.

C. Generative and decision operations (higher risk) - Examples: text.content.summarize, text.content.classify, agent.plan.generate, agent.input.route. - Variability comes from model behavior and provider differences.

3.2 Design rule¶

A capability is valid for this product if:

It has a clear contract boundary.
It can be tested for conformance independent of workflow context.
It does not encode provider-specific semantics in its public contract.

4. Trust and Quality Model¶

4.2 Lifecycle states¶

draft
validated
lab-verified
trusted
recommended

Cold-start is explicitly supported: skills can be lab-verified before field usage volume exists.

4.3 Scores¶

readiness_score: internal evidence
field_score: production behavior
overall_score: weighted by sample volume

Weighting model by executions_30d:

< 20: overall = readiness only
20 to 49: readiness-weighted mixed score
= 50: field-weighted mixed score

5. Mitigating Execution Uncertainty with Portability¶

Portability introduces backend variance. This is controlled by explicit conformance and routing policy.

5.1 Binding conformance profile¶

Each binding/provider path is tagged by conformance profile:

strict: strongest contract compliance and deterministic behavior under policy
standard: acceptable for production with known variance bounds
experimental: available for opt-in, not default for trusted paths

5.2 Runtime selection policy¶

User-selected primary binding executes first.
Optional user-defined fallback executes next.
Official default binding is mandatory terminal fallback.

5.3 Capability assurance contract¶

For each capability, define:

Invariants that must hold across providers.
Error taxonomy and normalization rules.
Safety constraints and side-effect boundaries.
Conformance test vectors and expected outcomes.

6. Human UX Guardrails (Critical)¶

Quality controls must not degrade usability.

6.1 No-complexity default experience¶

If user does nothing:

Defaults are ready and executable.
Credentials are clearly prompted when needed.
Errors are actionable and short.

6.2 Optional complexity only when requested¶

Advanced controls (fallback chains, custom providers, strict profiles) are optional and progressive.

6.3 Explainability surfaces¶

For each capability/skill execution, user can inspect:

Effective binding and service.
Why it was selected.
Fallback path used, if any.
Trust state of the skill and evidence source.

6.4 Honest trust labels¶

Always show trust evidence source:

internal-evidence
mixed-evidence
field-evidence

Users must never confuse internal validation with broad field reliability.

7. Architecture Changes Required¶

This section maps the strategy to implementation deltas.

7.1 Registry layer (agent-skill-registry)¶

Keep unchanged as semantic source:

Capabilities remain canonical in YAML.
Skills remain declarative workflows.
Catalog generation remains descriptive.

Additions (non-breaking):

Optional metadata conventions for trust-oriented documentation.
Governance documentation for lifecycle semantics.

7.2 Runtime layer (agent-skills)¶

Add or extend:

Binding conformance metadata and profile enforcement.
Deterministic fallback resolution policy with official terminal fallback.
Quality ingestion pipeline:
internal evidence
usage evidence
feedback evidence
Skill quality catalog generation and exposure.
Explainability endpoints or commands for effective path introspection.

7.3 Observability layer¶

Add normalized metrics for quality:

success and failure classes by capability and binding
latency percentiles by capability and binding
fallback activation counts
user feedback aggregates

7.4 Consumer API and CLI¶

Add user-centric introspection operations:

explain capability resolution
explain skill trust state and evidence
list recommended and trusted skills with evidence source

8. Rollout Plan¶

Phase 1: Foundation (short)

Formalize lifecycle and scoring policy.
Publish conformance profile definitions.
Keep defaults simple and no-regression for current users.

Phase 2: Assurance and explainability

Add capability assurance contracts for prioritized capabilities.
Add binding conformance checks.
Expose explainability surfaces in CLI/API.

Phase 3: Product trust maturity

Ingest real usage and feedback signals.
Promote skills from lab-verified to trusted/recommended based on evidence.
Use trust states in discovery and routing preferences.

9. Success Criteria¶

Users can run defaults without configuration burden.
Skills have transparent trust states with clear evidence source.
Overrides do not compromise baseline reliability due to terminal default fallback.
Trusted and recommended states correlate with real quality outcomes.
Portability remains intact while uncertainty is bounded and communicated.