← back to portfolio

Problem: A multi-agent system produced ~500 unstructured messages/day across 8 VP agents. Without classification + routing, every message is a context-switch tax and per-session token cost is invisible.

What I built:

9-class message classifier with routing rules engine (98.1% accuracy on 53-thread backtest)
Post-classification flow dashboard — system map of where threads go after routing
Before/after token-cost timeline per VP per sprint phase
Local-extraction script that compresses prior-sprint state into the next session's context

Result: Per-session token cost made visible and attributable per VP, per sprint phase. Routing replaces inbox-style triage — only the relevant thread reaches the right person.

Stack: Python (classifier, routing rules), SQLite (forum.db with FTS5), Flask API, HTML/CSS/JS dashboards

Multi-Agent Org · Thread Routing System

Routing 500 Agent Messages/Day

Running a 7-agent org means every voice memo, bug report, feature idea, and correction needs to reach the right agent without me becoming the switch. The thread routing + parking lot system is the dispatch layer.

message categories

98.1%

routing accuracy

routing rules

threads backtested

fallbacks (<0.5 conf)

The problem: 4 hours/day, 7 agents running in parallel, 3 kids. Every message I send — a voice note about a bug, a direction to a VP, a correction, a feature idea — was landing in a general forum thread where it sat until I manually routed it. The parking lot system makes that routing automatic.

Message flow — from voice to VP

9 message categories

bug

19% of threads

Route by subcategory: UI bugs → VP Mobile, infra → VP Systems. No Diane approval needed.

feature

9% of threads

Always requires Diane approval. Parked until she taps yes/no. Highest time cost of any category.

improvement

15% of threads

Routed directly to owning VP. Scoped by /improve skill before dispatch. No approval gate.

proposal

15% of threads

Architecture / schema / org decisions. Routed to VP Opus for adversarial review before any implementation.

direction

11% of threads

Diane dispatching work. Detected by directive markers and imperative patterns. Lowest triage overhead.

report

17% of threads

Session summaries, grade reports. Acknowledge only — no agent action triggered. Near-zero time cost.

correction

4% of threads

Agent behavior fix. Direct edit by VP Systems — no approval needed, highest urgency.

question

6% of threads

Factual/metric/status questions → inline responses, no agent spawn. Safe default for fallback classification.

process

4% of threads

Workflow changes. Routed to VP Opus for system-level coordination.

Classifier — rules-based, 3-tier priority

# Three-tier classification (highest priority wins):
#   1. Diane's label (manual override, always wins)
#   2. Rules-based pattern matching on title + first comment
#   3. Fallback: 'question' (safe default — triggers triage, not execution)

def classify_thread(title: str, body: str) -> dict:
    text = (title + " " + body).lower()

    best_cat, best_conf = None, 0.0
    for pattern, category, weight in CATEGORY_RULES:
        if re.search(pattern, text, re.IGNORECASE):
            if weight > best_conf:
                best_cat, best_conf = category, weight

    if best_conf < 0.5:          # fallback
        return {"category": "question", "confidence": 0.3, "fallback": True}

    subcategory = _classify_subcategory(best_cat, text)
    return {"category": best_cat, "subcategory": subcategory, "confidence": best_conf}

Backtest results — 53 historical threads

Category	Count	Avg Diane Time Cost	Routing decision
bug	10 (19%)	1.7 — medium	Subcategory split: UI → VP Mobile, infra → VP Systems. No approval.
feature	5 (9%)	2.6 — highest	Always parked for Diane approval first. Highest interruption cost.
improvement	8 (15%)	1.9 — medium-high	Direct to owning VP after /improve scoping.
proposal	8 (15%)	2.2 — high	VP Opus adversarial review gate before any implementation.
direction	6 (11%)	1.2 — low	Detected by imperative markers. Lowest overhead to dispatch.
report	9 (17%)	0.4 — none/low	Acknowledge only. No agent spawned.
correction	2 (4%)	1.5 — medium	Direct VP Systems edit. No approval, highest urgency.
question	3 (6%)	2.3 — high	Inline response where possible. No VP spawn if answerable from context.
process	2 (4%)	1.0 — low	VP Opus coordination.

Root cause analysis — what prompted this

Correction type	Count	%	Root cause
quality_gap	27	47%	Agents shipping incomplete work, not running QA before declaring done
missed_queue	8	14%	Postmortems not written, threads not closed, forum housekeeping treated as optional
wrong_scope	6	10%	Agents building things that already existed — no cross-check before implementing
wrong_priority	5	9%	Same corrections repeated across sessions — no durable encoding of rules
wrong_route	5	9%	Messages routed to the wrong VP without checking ownership table

What shipped

Rules-based classifier: 9 categories, 41 routing rules, 3-tier priority (manual → rules → fallback)
98.1% routing accuracy on 53-thread backtest — 3 fallbacks (confidence < 0.5), all safely defaulted to 'question'
Flask API routing layer — all 7 agents query the same classifier, no direct DB access
Parking lot lifecycle: parked → approved → dispatched → closed — with VP KPI mapping on dispatch
84% of agent corrections were quality_gap + missed_queue: the routing system works. The problem was agent execution quality, not classification accuracy.