Diane Doten — Recent Work

01 · LLM Cost / Context Engineering

Thread Routing & Context Optimization

9-class classifier → per-VP token cost attribution

01 · LLM Cost / Context Engineering

Thread Routing & Context Optimization

Problem: A multi-agent system produced hundreds of unstructured messages a day across 8 VP agents — review time became the bottleneck.

Why it matters: Without classification + routing, every message is a context-switch tax. With it, only the right thread reaches the right person, and per-session token cost becomes attributable.

What I built:

9-class message classifier with routing rules
Post-classification flow dashboard (system map of where threads go)
Before/after token-cost timeline per VP per sprint phase
Local-extraction script that compresses prior-sprint state into the next session's context

Result: Per-session token cost made visible and attributable per VP, per sprint phase. Routing replaces inbox-style triage — only the relevant thread reaches the right person.

Stack: Python (classifier), SQLite (forum.db with FTS5), Flask (localhost:5556 API), HTML/CSS/JS

Open routing → Open timeline →

02 · Multi-Agent Orchestration

Wave Run v2 — Prose Skill → Airflow DAG

10 mechanical phase gates; LLM removed from orchestration path

02 · Multi-Agent Orchestration

Wave Run v2 — Prose Skill → Airflow DAG

Problem: A 10-phase prose skill for running parallel LLM agents produced 7 distinct, repeated failure patterns. Cost: 4–6 hours of merge cleanup per sprint.

Why it matters: Prose instructions are not enforcement. An LLM with all the orchestration info in context will improvise under pressure. The fix isn't clearer prose — it's removing the LLM from the orchestration decision entirely.

What I built:

Codified the 10-phase skill into wave_run_v2 Airflow DAG
Mechanical phase gates: entrypoint contract → manifest load → forum bind → preflight lint with autofix → ownership check → typed approval → worktree gate → dispatch → freshness-validated handoff → output validation
Coordinator (DAG) has no git access; orchestrator (separate session) does — separation is mechanical, not by convention

Result: 7 documented failure patterns from the prose-skill version eliminated mechanically. 4–6 Diane-hours of merge cleanup per sprint reclaimed.

Stack: Airflow 3.x, Python, Celery, cmux, Postgres + Redis, manifest.yaml + manifest.db

Open the DAG walkthrough →

03 · LLM Cost Optimization

Airflow 5-Tier Watchdog

Deterministic L1–L2 absorb most incidents; LLM cost capped at $0.10

03 · LLM Cost Optimization

Airflow 5-Tier Watchdog

Problem: A 24/7 monitoring system can't afford an LLM call in every hot path. At 1 incident per 10 min × $0.50/incident, that's $72/day on a system that lives indefinitely.

Why it matters: Most production events are deterministic. Reserve LLM cost for events that genuinely need reasoning. Tier the watchdog so the cheap layers fail-fast and the expensive layer is the exception, not the default.

What I built:

L1 probes — 24/7 deterministic checks ($0/incident, 61-probe catalog)
L2 recipes — predefined repair scripts ($0/incident, YAML-keyed)
L3 webhook — Claude routine for the L2 misses ($0.05–0.20/incident)
L4 Telegram tap — human approval for proposed patches
L5 weekly digest — batched unresolved incidents

Result: 12 deliverables (D1–D12) shipped + merged 2026-05-01. LLM cost capped at single-digit dollars per week. Diane sits at L4, not L1 — escalation flow ends with the human, doesn't start with them.

Stack: Python (health_check.py, health_fix.py), YAML config, Anthropic Claude routine, Telegram Bot API, cron + plist

Open the watchdog →

04 · On-Device ML

Phoneme Classifier — 97% accuracy on-device

15.4KB MFCC model · 66 phonemes · no cloud · <100ms Android

04 · On-Device ML

Phoneme Classifier — 97% accuracy on-device

Problem: Speech recognition for 4-year-olds, on-device, on a phone. Without sending audio of children to any cloud.

Why it matters: Cloud calls would mean COPPA compliance overhead, latency, and ongoing inference cost. Most speech models are MB-to-GB. The constraint: fit accurate phoneme recognition in tens of KB that runs on a low-end Android in <100ms.

What I built:

Two-track classifier — Track A (MFCC + cosine, 15.4KB asset), Track B (WavLM fine-tune, ONNX export)
VTLN factor 1.104 derived from child F0 mean (269 Hz) — adult-trained features warped to child vocal tract
Sander 1972 substitution table for age-gated developmental allowances
4-gate noise rejector for bedroom-recording realism
Confusion matrix audit, 120 unit tests across 9 files

Result: 97.0% accuracy on full 66-phoneme test set (only 2 phonetically-identical confusions). 100% on Diane's voice baseline. Production-shipping in L2R Android app.

Stack: Python (librosa, scikit-learn, ONNX, PyTorch for Track B), Kotlin (custom PhonemeClassifier on Android)

Open the classifier → 🔊 Audio QA dashboard →

05 · End-to-End ML Pipeline

Pose Extraction → Find the Sound

Human video → 2D pose → CharacterScaler → Rive animal animation

05 · End-to-End ML Pipeline

Pose Extraction → Find the Sound

Problem: Animations for a kids' phonics app need expressive face/body params for 10 animals. No animator on staff. Hand-keyframing 10 animals × 9 gestures = months I don't have.

Why it matters: Product surface depends on animation fidelity — kids respond to expressive characters. The pipeline has to take human-recorded reference video as input and produce Rive-ready params as output, automatically, per-animal.

What I built:

End-to-end pipeline: human video → 2D pose → CharacterScaler per-animal → Rive runtime params
Video-as-floor, MoCap-as-ceiling: Diane's verified videos set magnitude floor; BABEL/AMASS MoCap can add but not override
Per-animal scaling from manifest (not hardcoded) — adding the 10th animal requires zero code changes
Ships keyframes via Rive MCP (~1200 calls per animal); post-wire structural eval

Result: Production-shipping in L2R V8 release (Play Store gate passed). 5 base videos × multiple animals × 9 gestures, all auto-derived. Reproducible via one batch command.

Stack: Python (RTMLib 2D landmarks, BABEL/AMASS, savgol smoothing, RDP), Rive runtime (Kotlin Android), JSON manifests, MCP-driven .riv emission

Open the pipeline → Frame browser →