PROBLEM: A multi-agent system produced hundreds of unstructured messages a day across 8 VP agents — review time became the bottleneck.
WHY IT MATTERS: Without a classifier + routing layer, every message is a context-switch tax. With it, only the right thread reaches the right person, and per-session token cost becomes attributable.
STACK: Python (9-class classifier), SQLite (forum.db), Flask (localhost:5556 API), HTML/CSS/JS (this dashboard)
What happens AFTER Layer 3 (classification). Every category traced with color + preemption scoring.
Layer 4+: What Happens After Classification
Layer 3 output: category + subcategory + confidence. Layer 4 routes to one of 4 paths.
From Layer 3: Classified Issue
category + subcategory + confidence + skip_intake flag
▼
Layer 4: CATEGORY ROUTER
Routes to one of 4 paths based on category
▼
PATH A: BYPASS
No VP. No buffer. Respond and close.
question:metric → read file
question:status → read exec_log
question:cost → invoke analyst
question:process → VP Systems inline
report:session →
SKIP intake
report:shipped → ack + archive
correction:skill → direct edit
correction:naming → direct edit
✓ Auto-closed. Free. No interruption.
PATH B: 2-STAGE SCOPE
Scope first. Diane approves. THEN buffer.
Stage 1: /improve MINIMAL
Sanity + domain + analyst brief
Headless. ~Sonnet. ~5 min.
▼
Diane: "Worth exploring?"
[Yes --top X] or [No → close]
YES ▼
Stage 2: /improve FULL --top X
All 9 steps. Scout + Planner.
Headless. ~Opus. ~20 min.
▼
Diane: "Approve for VP?"
[Approve] [Defer] [Reject]
APPROVE ▼
BUFFER (delivery gate)
Wait for VP capacity. Batch related.
▼
VP receives pre-scoped work
PATH C: SCORED DELIVERY
Score decides: preempt, switch, or buffer.
PREEMPTION SCORER
base + keywords + tone + age + task_penalty
▼
▼
VP acts
Proposals also need Diane approval
PATH D: AUTO-PROCESS
System handles. No VP. No Diane.
status update → Outstanding.md
direction:priority → P0 flag
process:* → VP Systems edit
✓ Auto-processed and closed.
bug
feature
improvement
question
correction
direction
proposal
report
process
Path A: Bypass — No VP, No Buffer, Respond and Close
Cheapest path. System answers directly. Zero interruption to VPs.
question:factual
e.g., "Where is the config file for reporter?"
respond inline with file path
Cost: free | Time: <30s
question:metric
e.g., "What's the JPY Sharpe ratio?"
read tracker.csv / Views.md
respond with value + source
Cost: free | Time: <30s
question:cost
e.g., "How much did today's sessions cost?"
respond with cost breakdown
Cost: ~Sonnet | Time: ~2 min
correction:skill / correction:naming
e.g., "Rename the postmortem to include time"
reply: "Updated. Commit abc1234."
Cost: free | Time: ~5 min
report:session (VP reports)
e.g., "VP ML session: DirectLGBM shipped"
These are VP→Diane FYI. Not actionable issues. Filtered by classifier.
status update (blocker cleared)
e.g., "Registered in Play Store (F.2 done)"
detect milestone/blocker reference
update Outstanding.md + tracker.csv
reply: "F.2 marked done."
Cost: free | Milestone-specific update.
Path B: Two-Stage Scoping — Feature & Improvement Flow
Diane controls both gates. VP never sees unscoped work.
Issue classified as: feature or improvement or question:architecture
▼
STAGE 1: /improve MINIMAL (headless, no VP)
What runs:
✓ Sanity check (grep for duplicates)
✓ Domain research (WebSearch)
✓ Analyst brief (1-paragraph ROI)
What does NOT run:
✗ Scout (prior art deep dive)
✗ Planner (implementation plan)
✗ Full gap analysis (9 steps)
Cost: ~Sonnet | Time: ~5 min | Artifacts: minimal_brief
▼
GATE 1: Diane — "Worth exploring?"
"Exogenous variables for ML model"
Not duplicate. Domain research: macro indicators improve FX accuracy 15-30%.
Analyst: High confidence. Standard in FX literature.
Yes --top 5
No
Diane reviews async (phone/Forum Mobile). ~30 seconds.
NO ▼
CLOSED
No further cost.
Reason logged.
YES --top X ▼
STAGE 2: /improve FULL --top X
All 9 /improve steps run headless:
✓ Smoke test ✓ Domain deep dive ✓ KPI gap analysis
✓ Scout (prior art, build vs buy) ✓ Planner (milestones, tasks, estimates)
✓ Top X recommendations ranked by ROI
Cost: ~Opus | Time: ~20 min | Artifacts: improve_report + scout_report + implementation_plan
Diane controls X — how many results to produce (default 5, can set 3, 7, 10...)
▼
GATE 2: Diane — "Approve for VP?"
Top 5 recommendations:
1. Add DXY index (est 2h, +12% MAPE) ★
2. Add VIX volatility (est 1h, +8% MAPE)
3. Add yield spreads (est 3h, +5% MAPE)
Plan: 3 milestones, 8 tasks, est 6h total
Scout: yfinance free tier, Alpha Vantage API
Approve
Defer
Reject
APPROVE ▼
NOW ENTERS BUFFER
state: approved
Linked to milestone + KPI in tracker
Waits for VP capacity via delivery gate
▼
VP receives pre-scoped work
All artifacts attached. No in-session scoping.
Picked up via /next-task Level 2.5
Path C: Scored Delivery — Bugs, Directions, Proposals, Corrections
Preemption scorer decides WHEN the VP sees it. Buffer batches low-urgency items.
Issues routed to Path C:
bug:*
direction:task
proposal:*
correction:data/process
▼
PREEMPTION SCORER (5 components → 0-100)
-20
penalty
(task progress)
▼
≥70 PREEMPT
VP drops current task. Direct push.
When this fires:
Diane confirmed bug by testing (base=40)
Data loss/corruption detected (base=40)
"Stop!" + role violation (kw=30)
3+ rapid messages same issue (tone=20)
Real examples (from data):
• "Stop! Why are you updating skills??" → 80
• Email scanner 2nd bug + escalation → 95
• "very very important to me" (Discord) → 80
40-69 SWITCH
VP finishes current sub-task, then switches.
When this fires:
Stale state / redundancy catch (base=20+kw=20)
Domain knowledge injection (base=20+bonus)
"did you check..." (kw=15)
Prior decision override (base=30)
Real examples:
• "You only worked for 2 hours" → 65
• "M9-M12 are already done" → 40
• "did you check forum?" → 45
<40 BUFFER
Parked. VP pulls via /next-task when ready.
When this fires:
New bug report (not confirmed) (base=15)
Direction without urgency (base=15)
Proposal for later (base=15)
Buffer behavior:
• Batched by VP × priority × category
• 3 UI bugs → delivered as 1 batch
• Age escalation: +10 if open 2+ sessions
• VP pulls at /next-task Level 2.5
Real examples:
• Q.58b "Play Store next steps" → 25
• "I'd like diversity of datasets" → -2
Scorer Retraining
After each preemption decision, log: {issue_id, score, band, diane_agreed: yes/no}
Weekly: retrain component weights from Diane's labels. Same approach as classifier retraining.
Target: <2 misclassified preemptions per week.
Preemption Scorer — Full Component Breakdown
| Component | Signal | Score |
| Base Category (0-40) |
| QA failure confirmed by Diane using product | 40 |
| Data loss / corruption detected | 40 |
| Infrastructure failure blocking comms | 35 |
| Prior decision being violated | 30 |
| Role/process boundary violation | 30 |
| Cost waste at calculable rate | 25 |
| Domain knowledge injection | 20 |
| Stale state / redundancy catch | 20 |
| Feature priority pivot | 15 |
| New unconfirmed bug report | 15 |
| Design iteration feedback | 5 |
| Urgency Keywords (0-30) |
| "Stop!" / "very very" / "this is critical" | 30 |
| "I found a bug" / "X is broken" / two bugs same session | 25 |
| "based on earlier conversations we decided..." | 20 |
| "why is X..." (active use, something missing) | 20 |
| "did you check..." | 15 |
| "how about..." (domain question) | 10 |
| "I was thinking..." / "I'd like..." | 3 |
| Tone (0-20) |
| 3+ messages rapid succession about same issue | 20 |
| Past-tense confirmation ("I tested it, it...") | 15 |
| Explicit cost calculation presented | 15 |
| Escalation from earlier in same session | 10 |
| Single informational message | 0 |
| Age (0-10) |
| Issue raised 2+ sessions ago, still open | 10 |
| Issue raised earlier same session | 5 |
| New issue this session | 0 |
| Current Task Penalty (0 to -20) |
| Task >80% complete (about to commit) | -20 |
| Task 50-79% complete | -10 |
| Task <50% / debugging / stuck | 0 |
| Task is meta/process (not user-facing) | +5 |
Calibration: 9 Real Events
| Event | Base | Kw | Tone | Age | Pen | Total | Band | Match? |
| "Stop! Why are you updating skills??" | 30 | 30 | 15 | 5 | 0 | 80 | PREEMPT | ✓ |
| Email scanner 2nd bug + escalation | 40 | 25 | 20 | 5 | +5 | 95 | PREEMPT | ✓ |
| "very very important to me" (Discord) | 35 | 30 | 15 | 0 | 0 | 80 | PREEMPT | ✓ |
| "You only worked for 2 hours" | 30 | 20 | 15 | 0 | 0 | 65 | SWITCH | ✓ |
| "M9-M12 are already done" | 20 | 20 | 0 | 0 | 0 | 40 | SWITCH | ✓ |
| "did you check forum?" | 20 | 15 | 10 | 0 | 0 | 45 | SWITCH | ✓ |
| "how about exogenous variables?" | 20 | 10 | 0 | 0 | -10 | 20 | QUEUE | ~ |
| Q.58 "registered + need breakdown" | 15 | 10 | 0 | 0 | 0 | 25 | QUEUE | ✓ |
| "I'd like diversity of datasets" | 5 | 3 | 0 | 0 | -10 | -2 | QUEUE | ✓ |
8/9 correct. 1 edge case (~): Diane's domain questions during ML work occasionally have paradigm-level ROI. Needs VP-ML context bonus.