← Back to portfolio
PROBLEM: A 10-phase prose skill for running parallel LLM agents produced 7 distinct, repeated failure patterns. Cost: 4–6 hours of merge cleanup per sprint.
WHY IT MATTERS: Prose instructions are not enforcement. An LLM with all the orchestration info in context will improvise under pressure. The fix isn't clearer prose — it's removing the LLM from the orchestration decision entirely.
STACK: Airflow 3.x, Python (Celery worker, custom operators, PythonSensor with deferrable mode), cmux (terminal multiplexer), Postgres + Redis, manifest.yaml + manifest.db

Wave Run v2 — Multi-Agent Orchestration as a DAG

A 660-line prose skill, codified into an Airflow DAG with 10 mechanical phase gates.
An LLM with all the orchestration info in context will skip the "delegate to a separate session" step under pressure. The fix is not "remind the LLM more clearly." The fix is to put the orchestration decisions where the LLM can't reach them.

The 10-phase process

Each phase is an Airflow task. The scheduler enforces order; the LLM only runs inside individual sessions, not across them. Status: ✅ codified · 🟡 partial · ⚪ backlog

Phase 000 — Entrypoint Contract✅ codified
Validate trigger conf has the canonical fields before any other task runs.
Input: dag_run.conf { run_id, manifest_path, target_wave?, manifest_db_path? } Gate: require run_id != 'wave-unset'; require manifest_path exists; target_wave int when provided Output: PASS → resolve_execution_context | FAIL → hard block
Eliminates: phase skip / improvisation (Pattern E) — DAG can't run downstream without entrypoint pass
Phase 0 — Load Manifest✅ codified
Resolve manifest path; bind run to executable session inventory.
Input: run_id, manifest_path Logic: wave_actions._load_manifest(source_path) - direct wave_sessions/sessions - or CPV3 phase_12.manifest_path redirect Output: resolved manifest path + session_count + target_wave metadata
Eliminates: wrong-repo worktrees (Pattern F) — repo attestation is per-session in the manifest, validated here
Phase 0.5 — Bind Wave Forum Thread✅ codified
Create or bind the per-wave coordination thread; persist thread id for audit.
Logic: 3-mode binding contract 1) POST /api/db/intake 2) fallback POST /api/db/threads 3) fallback /tmp/wave-run-<slug>.log Output: wave_forum_thread_id persisted in manifest + manifest.db run row
Why it's a phase, not an afterthought: the coordination/audit channel binds at execution start, not when something goes wrong
Phase 0.7 — Preflight Lint✅ codified
Mechanical lint of every directive: frontmatter, required sections, MERGE GATE block auto-injected.
Input: directive_dir + expected_sessions (target-wave scoped) Logic: _preflight_errors() → _autofix_directives() → _dispatch_repair_directive() (one retry) Output: PASS or fail-closed after retry
Eliminates: VP self-merge (Pattern A) — every directive lands with MERGE GATE block; VPs no longer default-merge at end of session
Phase 1 — Ownership Check✅ codified
Hard-fail if two parallel sessions claim the same files.
Logic: validate_ownership_overlap() if fail → _dispatch_repair_directive() + one retry Output: PASS or RuntimeError("Ownership gate failed after repair retry")
Eliminates: merge-conflict-by-design — overlapping file ownership is a precondition error, caught before dispatch
Phase 2 — Tap #1 (Pre-launch approval)✅ codified
Typed human approval state machine before dispatch.
Input: marker files (.approved/.rejected/.hold/.expired) OR manifest.db decision events Logic: _poll_tap_decision() → sensor_tap_prelaunch() States: approved → proceed; pending → poll; rejected/hold/expired → hard-fail with terminal decision
Why typed states: previously approval timeout silently degraded to "go anyway" — now terminal non-approval blocks dispatch
Phase 3 — Worktree Gate + Dispatch✅ codified
Verify worktrees exist + branch alignment before launching any session.
Logic: worktree_gate() — verify path exists + git worktree list maps path → expected branch launch_wave_sessions() — dispatch_directive(...) per session, persist manifest after each success Output: per-session status=launched OR first-failure halts remainder (no partial-launch desync)
Eliminates: worktree gate skipped (Pattern C) — sessions can't accidentally commit to main because the gate verifies branch isolation BEFORE any agent starts
Phase 4 — Execute & Monitor✅ codified
Wait for handoffs with freshness validation; never accept stale state as "ready."
Logic: sensor_wait_wave_sessions() - require non-empty selected sessions - handoff_exists_strict(...) - freshness check vs session_start validate_session_outputs() — same freshness check before commit readiness Output: fresh + complete → advance | stale/empty → block (never false-ready)
Eliminates: teardown skip (Pattern D) — Phase 5 sensor blocks until /end-session per session lands a fresh handoff
Phase 5–6 — Orchestrator Merge🟡 partial
Spawn a separate orchestrator session to execute merges. Coordinator never runs git itself.
Logic: merge orchestrator dispatched as autonomous session - generates orchestrator directive from template - runs merge_branches.sh with conflict detection + rollback Why this is the most important phase: the COORDINATOR (this DAG) has no git access. The ORCHESTRATOR (a separate cmux session) does. Separation is mechanical, not by convention.
Eliminates: coordinator merges directly (Pattern B) — DAG tasks have no shell access to git

The 7 failure patterns this DAG eliminated

Source: forensic analysis of 38 wave-run handoffs across 9 sprints (Apr 11–20, 2026).

Pattern Frequency Cost / incident How v2 prevents it
A: VP Self-Merge 3 in 1 sprint ~1 day cleanup Phase 0.7 lint auto-injects MERGE GATE block into every directive before dispatch
B: Coordinator Merges Directly 4 incidents, 3 sprints 30–90 min Coordinator is a DAG — no shell, no git. Orchestrator is a separately-dispatched session
C: Worktree Gate Skipped 1 critical (Apr 13) 60+ min Phase 3 task worktree_gate blocks dispatch unless git worktree list matches manifest
D: Teardown Skipped 1 incident ~1 day cleanup Phase 5 sensor blocks until each session writes a fresh handoff (freshness validated against session_start)
E: Phase Skip / Improvisation 2 incidents varies Airflow scheduler enforces phase order. There's nothing to improvise — the next task is whatever the DAG runs next
F: Wrong-Repo Worktrees 1 incident (CYOA) ~30 min Per-session repo attestation in directive frontmatter, validated at Phase 0 entrypoint contract
G: Dispatch Bypass 1 incident varies Dispatch is a Python module called by the DAG, not a prose instruction the operator can ignore

Why this matters as a Generative AI engineering pattern

Most "agents fail at X" stories end with "we wrote a stricter prompt." That's a moving target. The real fix is to identify what the LLM shouldn't be deciding, then put those decisions in deterministic code where the LLM can't reach them.

In this system, the LLM is excellent at doing the work inside a session — writing code, running tests, producing handoffs. It is unreliable at coordinating across sessions, where context is thin and incentives compound (every agent thinks it should "ship now"). So the architecture moved orchestration into Airflow, kept LLMs inside the per-session boxes, and gave the human a single approval surface.

The result is the GenAI version of "separate the things that change from the things that don't." Coordination doesn't change between sprints — it's a fixed 10-phase contract. The work inside each session changes every time. So coordination is code; work is LLM.

Result:
7 documented failure patterns from the prose-skill version eliminated mechanically.
~4–6 Diane-hours of merge cleanup per sprint reclaimed. Across 9 documented sprints (Apr 11–20), that's 1–2 working days returned.