RFC-0002: Durable Execution State Machine¶

Status: Draft Date: 2026-05-26 Depends on: docs/rfcs/RFC-0001-ORCA-RUNTIME-BLUEPRINT.md

1. Scope¶

Define canonical run lifecycle semantics for durable execution:

Allowed transitions:

Terminal states:

Checkpoint creation is mandatory at these boundaries:

Checkpoint creation is optional at these boundaries:

Rationale: keep durable semantics strong while avoiding unnecessary persistence overhead.

Minimum fields:

Compatibility note: existing fields (result/error/status) remain populated during migration, but RunStore v2 is the canonical internal model.

Resume preconditions:

run.status in {waiting_for_human, waiting_for_signal, failed, canceled}.
checkpoint_head exists.
policy re-evaluation succeeds unless forced recovery mode is explicitly enabled.

Resume behavior:

Replay behavior:

Replay is never in-place mutation of an existing run.

Fork behavior:

Clone a run from a selected checkpoint.
Assign new run_id.
Preserve source linkage metadata.
Allow different runtime options/policies unless restricted by environment policy.

Current run status model in docs/ASYNC_EXECUTION.md supports running/completed/failed only. Decision: canonical runtime status model is adopted now; legacy response shape is a temporary projection layer.
Current run_store.py stores minimal execution metadata. Decision: RunStoreV2 replaces authoritative run state. Existing RunStore methods are compatibility wrappers only.
Current checkpoint.py serializes state but has no lifecycle orchestration. Decision: CheckpointManager becomes mandatory lifecycle component and wraps existing serializer with versioned upgrade path.

Pause for confirmation and resume to completion.
Replay from step boundary using recorded side-effect results.
Fork from checkpoint and run with modified non-breaking options.
Canonical run status is exposed consistently across Python and HTTP runtime APIs.