← Back to portfolio
PROBLEM: Token cost of running 8 VP agents across a sprint was unbounded; impossible to attribute waste per session.
WHY IT MATTERS: Without per-session telemetry, "agents are expensive" is unfalsifiable. This dashboard compares before/after impact of context-engineering changes (local extraction of prior sprint state).
STACK: Python (telemetry collector), local-extraction script for context optimization, dashboard rendering

VP Session Timeline — Before vs After

Every step from /startup to /end-session · Color = local ($0) vs Claude API ($) · Hover for plan v2 assignments

LOCAL — runs without Claude ($0)
CLAUDE API — needs LLM reasoning
DIANE — human decision
WASTE — tokens burned, no impact
REMOVED — deleted in rewrite
45,200
BEFORE: tokens/session
~12,000
AFTER: tokens/session
73%
reduction
$93→$20
/month overhead
Side-by-Side
Before (Current)
After (Rewrite)
Savings Breakdown
BEFORE — Current Architecture
~45,200 tokens · 93% overhead
── STARTUP PHASE (~5,500 tokens) ──
Load /startup skill
10 bash blocks: pgrep, launchctl, cmux, lsof
1,800 tok
Finding #09: Extractable Work
Impact:These 10 commands run identically as shell script
Task:M7.1 — Rewrite startup (1800→500)
Agent:VP Systems, main session
Load /morning skill (alias)
Documented alias for /standup — double load
1,793 tok
Finding #17: morning vs standup Double Load
Impact:1,793 wasted tokens every morning
Task:M7.2 — Merge morning into standup
Agent:VP Systems, main session
Load /standup skill
Reads Forum.md (24K), 4 tracker.csv, Outstanding.md, Handoffs/
1,893 tok + ~33K file reads
Finding #19: Session Cost Paradox
Impact:33K tokens of file reads that preflight.py does in <1 sec
Task:M7.2 — Rewrite standup (1893→600)
Script:preflight.py, standup_report.py
── VP SESSION START (~18,000 tokens) ──
Load director/SKILL.md
Full 54K file including Agent System (unused by 2/4 VPs)
13,576 tok
Finding #03: Agent System Unused
Impact:4,000 tokens of agent routing loaded for VPs that never dispatch agents
Task:M5.1-M5.3 — Split into director-core (3K) + director-agents (2.5K)
Agent:VP Systems, main session
Load VP SKILL.md
VP-specific rules, scope, QA checklist, KPIs
2,800-4,300 tok
/self-brief: Read 5-7 files
tracker.csv, Forum.md, Outstanding.md, execution_log, handoffs
1,474 tok + ~8K file reads
Finding #09: Extractable Work
Impact:All 5 reads are in /tmp/reign-vp-{name}.md
Task:M7.4 — Rewrite self-brief (1474→400)
Script:preflight.py generates VP briefing packets
Step 1.5: Self-Improvement Proposal
"Runs every session" — ran 0/28 sessions
350 tok
Finding #04: Never-Ran Claims
Impact:350 tokens loaded, never executed in 28 sessions
Task:M5.5 — Delete. Sentinel replaces this.
Synthesize session plan
Combine data → prioritized T1/T2/T3 → present to Diane
~500 tok
Diane: review plan, say "go"
Silence = approval. Redirects if needed.
── EXECUTION LOOP (×2-8 tasks, ~25K tokens) ──
Execute task
Code generation, reasoning, debugging
variable
Load /qa skill (per task)
Full 4,839 tokens reloaded per "done" declaration. 90% static.
4,839 tok × N
Finding #18: QA Silent Token Bomb
Impact:8 tasks × 4,839 = 38,712 tokens/morning
Task:M7.9 — Rewrite QA (4839→800), inline in VP skill
Agent:VP Systems, main session
Load /next-task skill
Priority waterfall — 5 levels of file reads
1,867 tok × N
Finding #09: Extractable Work
Impact:preflight.py --next-task does this locally in <1 sec
Task:M7.5 — Rewrite next-task (1867→500)
Script:preflight.py --next-task --vp {name}
Load /checkpoint (every 30 min)
"Mandatory" — ran ~2/28 sessions
1,768 tok × N
Finding #13: Checkpoint Absence
Impact:1,768 tokens loaded, almost never executed
Task:M7.6 — Rewrite checkpoint (1768→400)
── SESSION END (~8,400 tokens) ──
Load /end-session skill
Orchestrates handoff + postmortem
1,691 tok
Load /handoff skill
Write structured continuity note
2,195 tok
Load /postmortem skill
Grading rubric + template (no calibration)
4,529 tok
Step 4: 8 cleanup items
Views sync, archive, Scout, reconcile, Outstanding — ALL SKIPPED
2,000 tok (dead)
Finding #10: Session-End Never Runs
Impact:2,000 tokens, 8 items, skipped 28/28 sessions
Task:M4.1 — end_session_local.sh does these via hook
Script:end_session_local.sh (SessionEnd hook)
Analytics skills
/systems, /org, /project, /owner — 0/28 sessions
~15K tok (never loaded)
Finding #04: Never-Ran Claims
Impact:5 analytics skills defined as Claude skills, ran 0 times
Task:M3 — sentinel.py + M4 — daily_review.py + weekly_analytics.py
AFTER — Local-First Architecture
~12,000 tokens · 80% for actual work
── PRE-FLIGHT (runs locally, $0) ──
preflight.py (cron, every 30 min)
Pre-computes ALL data: tracker, Forum, Outstanding, handoffs, exec log, parking lot → /tmp/reign-*.json
0 tok
<1 sec
M2.1: Extend preflight.py
Agent:Backend Agent (Sonnet)
Saves:~19K tokens/morning
Script:preflight.py (already built)
sentinel.py (cron, every 2 hr)
8 health checks: systems, org, project, handoffs, quality, memory, cross-VP, cost → /tmp/sentinel.json
0 tok
<5 sec
M3.1-M3.8: Sentinel v2
Agent:Backend Agent (Sonnet)
Replaces:5 analytics skills that ran 0/28 sessions
Saves:~$15/mo
standup_report.py (6 AM daily)
Morning briefing pre-built → /tmp/reign-standup.md
0 tok
<1 sec
M2.3: Preflight standup cron
Agent:DevOps Agent (Sonnet)
Replaces:/standup + /morning (3,686 tokens + 33K file reads)
── STARTUP (~500 tokens) ──
/startup (rewritten, 500 tok)
Read sentinel JSON (1 line). If green → skip. Then cmux workspaces.
500 tok
M7.1: Rewrite startup
Before:1,800 tokens + 10 bash blocks
After:500 tokens, reads pre-computed JSON
/standup (rewritten, 600 tok)
Read /tmp/reign-standup.md. Synthesize micro-decisions. Present.
600 tok
M7.2: Merge morning + rewrite standup
Before:1,893 + 1,793 = 3,686 tokens
After:600 tokens, reads pre-built standup
── VP SESSION START (~5,400 tokens) ──
director-core/SKILL.md (rewritten)
8 principles + execution loop + escalation. No agent routing, no file lists.
3,000 tok
M5.1: Rewrite director from scratch
Before:13,576 tokens (48% dead weight)
After:3,000 tokens (only hit sections)
Agent:VP Systems, main session
VP SKILL.md (rewritten, ~1,500 tok)
Rules + inline QA checklist + KPIs. No file-read lists.
1,500 tok
M6.1-M6.4: Rewrite VP skills
Before:2,800-4,300 tokens with file-read lists
After:~1,500 tokens, behavioral rules only
/self-brief (rewritten, 400 tok)
"Read /tmp/reign-vp-{name}.md. Present plan. Silence = approval."
400 tok
M7.4: Compress self-brief
Before:1,474 tokens + 8K file reads
After:400 tokens, reads 1 pre-built file
Synthesize session plan
LLM reasoning over pre-computed data → prioritized plan
~500 tok
Diane: review plan, say "go"
Same as before — human decision point.
── EXECUTION LOOP (×2-8 tasks, ~3K tokens overhead) ──
Execute task
Code generation, reasoning, debugging — unchanged
variable
QA gate (inline, ~0 extra tokens)
500-token checklist already in VP SKILL.md. No separate skill load.
0 extra
M5.8 + M7.9: Inline QA
Before:4,839 tokens × N tasks = 38,712/morning
After:0 extra (already in VP skill)
preflight.py --next-task (local)
Priority waterfall from ops.db. Returns: task + agent + model.
0 tok
<1 sec
M7.5: Rewrite next-task
Before:1,867 tokens × N tasks
After:500 tokens (reads local result)
Script:preflight.py --next-task --vp {name}
/next-task (rewritten, 500 tok)
Read local waterfall result. Only LLM reasoning if no match.
500 tok
── SESSION END (~2,700 tokens) ──
/end-session (rewritten, 500 tok)
"Write handoff. Run postmortem. Local cleanup runs via hook."
500 tok
M7.8: Rewrite end-session
Before:1,691 tokens + 2,000 dead Step 4 tokens
After:500 tokens, cleanup is local hook
/handoff (rewritten, 800 tok)
Write structured continuity note — needs LLM synthesis
800 tok
/postmortem (rewritten, 1,500 tok)
Grading + calibration questions (reads sentinel JSON for auto-caps)
1,500 tok
M8.1-M8.4: Postmortem with calibration
Before:4,529 tokens, no calibration
After:1,500 tokens + auto grade caps from sentinel
New:Cross-VP Awareness dimension, Diane's Time reform
end_session_local.sh (SessionEnd hook)
git status, archive Forum, reconcile, Views sync, Outstanding refresh, sentinel run
0 tok
<10 sec
M4.1 + M4.4: Session lifecycle hook
Replaces:Step 4 (2,000 tokens, skipped 28/28)
Agent:Backend Agent (Sonnet)
── BACKGROUND (crons, $0) ──
daily_review.py (midnight cron)
Replaces /review skill. Reads postmortems, calibrates grades, detects patterns.
0 tok
M4.2: daily_review.py
Replaces:/review skill that ran once in 10 days
Agent:Analyst Agent (Opus)
weekly_analytics.py (Friday 6 AM)
Replaces /owner-analytics + /project-analytics. Weekly coaching + KPI report.
0 tok
M4.3: weekly_analytics.py
Replaces:2 analytics skills (0/28 sessions)
Agent:Analyst Agent (Opus)

Current Architecture — Full Token Breakdown

PhaseStepTokensType
Startup/startup skill1,800waste
Startup/morning (alias)1,793waste
Startup/standup skill + file reads~35,000waste
VP Startdirector/SKILL.md13,57648% waste
VP StartVP SKILL.md2,800-4,300claude
VP Start/self-brief + file reads~9,500waste
VP StartStep 1.5 Self-Improvement350removed
Execute/qa × 8 tasks38,712waste
Execute/next-task × 814,936waste
Execute/checkpoint × 47,072removed
End/end-session + /handoff + /postmortem8,415claude
EndStep 4 cleanup (8 items)2,000removed
TOTAL (full 4-VP morning)~172,100

Rewritten Architecture — Full Token Breakdown

PhaseStepTokensType
Pre-flightpreflight.py + sentinel.py + standup_report.py0local
Startup/startup (rewritten)500claude
Startup/standup (rewritten, absorbed morning)600claude
VP Startdirector-core/SKILL.md × 4 VPs12,000claude
VP StartVP SKILL.md × 4 VPs (rewritten)6,000claude
VP Start/self-brief × 4 VPs (rewritten)1,600claude
ExecuteQA (inline in VP skill, 0 extra)0included
Execute/next-task × 8 (rewritten)4,000claude
ExecuteLocal next-task waterfall0local
End/end-session + /handoff + /postmortem × 411,200claude
Endend_session_local.sh (hook)0local
Backgrounddaily_review + weekly_analytics + sentinel0local crons
TOTAL (full 4-VP morning)~35,900
Savings: 172,100 → 35,900 tokens (79% reduction)
$93/mo → ~$20/mo overhead · $80/mo freed for actual work

Savings Breakdown by Category

What ChangedBeforeAfterSavedHow
Startup file reads~35,000035,000preflight.py pre-computes
director/SKILL.md × 454,30412,00042,304Rewrite: 13.5K → 3K per load
QA skill × 8 tasks38,712038,712Inline in VP skill (0 extra)
/next-task × 814,9364,00010,936Rewrite + local waterfall
Self-brief × 4 + file reads~38,0001,60036,400Reads 1 pre-built file each
morning + standup double3,6866003,086Merged + reads pre-built standup
Session-end cleanup2,00002,000Local hook (end_session_local.sh)
End-session skills × 433,66011,20022,460Rewrite: end+handoff+postmortem compressed
TOTAL SAVED~136,20079% of full morning