← back to portfolio
Problem: A multi-agent system produced ~500 unstructured messages/day across 8 VP agents. Without classification + routing, every message is a context-switch tax and per-session token cost is invisible.
What I built:
Result: Per-session token cost made visible and attributable per VP, per sprint phase. Routing replaces inbox-style triage — only the relevant thread reaches the right person.
Stack: Python (classifier, routing rules), SQLite (forum.db with FTS5), Flask API, HTML/CSS/JS dashboards
Multi-Agent Org · Thread Routing System

Routing 500 Agent Messages/Day

Running a 7-agent org means every voice memo, bug report, feature idea, and correction needs to reach the right agent without me becoming the switch. The thread routing + parking lot system is the dispatch layer.
9
message categories
98.1%
routing accuracy
41
routing rules
53
threads backtested
3
fallbacks (<0.5 conf)
The problem: 4 hours/day, 7 agents running in parallel, 3 kids. Every message I send — a voice note about a bug, a direction to a VP, a correction, a feature idea — was landing in a general forum thread where it sat until I manually routed it. The parking lot system makes that routing automatic.
Message flow — from voice to VP
INPUT Voice / Text Telegram / forum STORE Forum DB + metadata, time CLASSIFY Rules Classifier 9 cats · 41 rules TRIAGE Parking Lot features need Diane → Diane approves features DISPATCH VP Workstream KPI-mapped task CLOSE Shipped analytics tracked
9 message categories
bug
19% of threads
Route by subcategory: UI bugs → VP Mobile, infra → VP Systems. No Diane approval needed.
feature
9% of threads
Always requires Diane approval. Parked until she taps yes/no. Highest time cost of any category.
improvement
15% of threads
Routed directly to owning VP. Scoped by /improve skill before dispatch. No approval gate.
proposal
15% of threads
Architecture / schema / org decisions. Routed to VP Opus for adversarial review before any implementation.
direction
11% of threads
Diane dispatching work. Detected by directive markers and imperative patterns. Lowest triage overhead.
report
17% of threads
Session summaries, grade reports. Acknowledge only — no agent action triggered. Near-zero time cost.
correction
4% of threads
Agent behavior fix. Direct edit by VP Systems — no approval needed, highest urgency.
question
6% of threads
Factual/metric/status questions → inline responses, no agent spawn. Safe default for fallback classification.
process
4% of threads
Workflow changes. Routed to VP Opus for system-level coordination.
Classifier — rules-based, 3-tier priority
# Three-tier classification (highest priority wins):
#   1. Diane's label (manual override, always wins)
#   2. Rules-based pattern matching on title + first comment
#   3. Fallback: 'question' (safe default — triggers triage, not execution)

def classify_thread(title: str, body: str) -> dict:
    text = (title + " " + body).lower()

    best_cat, best_conf = None, 0.0
    for pattern, category, weight in CATEGORY_RULES:
        if re.search(pattern, text, re.IGNORECASE):
            if weight > best_conf:
                best_cat, best_conf = category, weight

    if best_conf < 0.5:          # fallback
        return {"category": "question", "confidence": 0.3, "fallback": True}

    subcategory = _classify_subcategory(best_cat, text)
    return {"category": best_cat, "subcategory": subcategory, "confidence": best_conf}
Backtest results — 53 historical threads
CategoryCountAvg Diane Time CostRouting decision
bug10 (19%)1.7 — mediumSubcategory split: UI → VP Mobile, infra → VP Systems. No approval.
feature5 (9%)2.6 — highestAlways parked for Diane approval first. Highest interruption cost.
improvement8 (15%)1.9 — medium-highDirect to owning VP after /improve scoping.
proposal8 (15%)2.2 — highVP Opus adversarial review gate before any implementation.
direction6 (11%)1.2 — lowDetected by imperative markers. Lowest overhead to dispatch.
report9 (17%)0.4 — none/lowAcknowledge only. No agent spawned.
correction2 (4%)1.5 — mediumDirect VP Systems edit. No approval, highest urgency.
question3 (6%)2.3 — highInline response where possible. No VP spawn if answerable from context.
process2 (4%)1.0 — lowVP Opus coordination.
Root cause analysis — what prompted this
Correction typeCount%Root cause
quality_gap2747%Agents shipping incomplete work, not running QA before declaring done
missed_queue814%Postmortems not written, threads not closed, forum housekeeping treated as optional
wrong_scope610%Agents building things that already existed — no cross-check before implementing
wrong_priority59%Same corrections repeated across sessions — no durable encoding of rules
wrong_route59%Messages routed to the wrong VP without checking ownership table

What shipped