Problem: A multi-agent system produced ~500 unstructured messages/day across 8 VP agents. Without classification + routing, every message is a context-switch tax and per-session token cost is invisible.
What I built:
9-class message classifier with routing rules engine (98.1% accuracy on 53-thread backtest)
Post-classification flow dashboard — system map of where threads go after routing
Before/after token-cost timeline per VP per sprint phase
Local-extraction script that compresses prior-sprint state into the next session's context
Result: Per-session token cost made visible and attributable per VP, per sprint phase. Routing replaces inbox-style triage — only the relevant thread reaches the right person.
Running a 7-agent org means every voice memo, bug report, feature idea, and correction needs to reach the right agent without me becoming the switch. The thread routing + parking lot system is the dispatch layer.
9
message categories
98.1%
routing accuracy
41
routing rules
53
threads backtested
3
fallbacks (<0.5 conf)
The problem: 4 hours/day, 7 agents running in parallel, 3 kids. Every message I send — a voice note about a bug, a direction to a VP, a correction, a feature idea — was landing in a general forum thread where it sat until I manually routed it. The parking lot system makes that routing automatic.
Message flow — from voice to VP
9 message categories
bug
19% of threads
Route by subcategory: UI bugs → VP Mobile, infra → VP Systems. No Diane approval needed.
feature
9% of threads
Always requires Diane approval. Parked until she taps yes/no. Highest time cost of any category.
improvement
15% of threads
Routed directly to owning VP. Scoped by /improve skill before dispatch. No approval gate.
proposal
15% of threads
Architecture / schema / org decisions. Routed to VP Opus for adversarial review before any implementation.
direction
11% of threads
Diane dispatching work. Detected by directive markers and imperative patterns. Lowest triage overhead.
report
17% of threads
Session summaries, grade reports. Acknowledge only — no agent action triggered. Near-zero time cost.
correction
4% of threads
Agent behavior fix. Direct edit by VP Systems — no approval needed, highest urgency.
question
6% of threads
Factual/metric/status questions → inline responses, no agent spawn. Safe default for fallback classification.
process
4% of threads
Workflow changes. Routed to VP Opus for system-level coordination.
98.1% routing accuracy on 53-thread backtest — 3 fallbacks (confidence < 0.5), all safely defaulted to 'question'
Flask API routing layer — all 7 agents query the same classifier, no direct DB access
Parking lot lifecycle: parked → approved → dispatched → closed — with VP KPI mapping on dispatch
84% of agent corrections were quality_gap + missed_queue: the routing system works. The problem was agent execution quality, not classification accuracy.