Diane Doten

Staff Data Scientist / ML — building LLM-optimized systems
Five systems below. Each was built solo over the past several months in 2–4 hour daily windows. The container itself is a sample — it's a static dashboard linking to deeper dashboards, every page hand-built. Most of this work is not on my résumé, but it's where my recent hours have gone: LLM cost optimization, multi-agent orchestration, on-device ML, end-to-end pipelines, and dashboards as the operational surface.
— Recent work · 5 systems · hover to explore
9-class classifier VP Systems VP ML VP Mobile 8 VP agents classify → route
01 · LLM Cost / Context Engineering
Thread Routing & Context Optimization
9-class classifier → per-VP token cost attribution
01 · LLM Cost / Context Engineering
Thread Routing & Context Optimization
Problem: A multi-agent system produced hundreds of unstructured messages a day across 8 VP agents — review time became the bottleneck.
Why it matters: Without classification + routing, every message is a context-switch tax. With it, only the right thread reaches the right person, and per-session token cost becomes attributable.
What I built:
  • 9-class message classifier with routing rules
  • Post-classification flow dashboard (system map of where threads go)
  • Before/after token-cost timeline per VP per sprint phase
  • Local-extraction script that compresses prior-sprint state into the next session's context
Result: Per-session token cost made visible and attributable per VP, per sprint phase. Routing replaces inbox-style triage — only the relevant thread reaches the right person.
Stack: Python (classifier), SQLite (forum.db with FTS5), Flask (localhost:5556 API), HTML/CSS/JS
10 mechanical phase gates entry contract manifest + forum preflight lint+fix worktree gate dispatch +validate ownership approval handoff freshness output QA coordinator has no git access — enforcement is mechanical 7 failure patterns eliminated
02 · Multi-Agent Orchestration
Wave Run v2 — Prose Skill → Airflow DAG
10 mechanical phase gates; LLM removed from orchestration path
02 · Multi-Agent Orchestration
Wave Run v2 — Prose Skill → Airflow DAG
Problem: A 10-phase prose skill for running parallel LLM agents produced 7 distinct, repeated failure patterns. Cost: 4–6 hours of merge cleanup per sprint.
Why it matters: Prose instructions are not enforcement. An LLM with all the orchestration info in context will improvise under pressure. The fix isn't clearer prose — it's removing the LLM from the orchestration decision entirely.
What I built:
  • Codified the 10-phase skill into wave_run_v2 Airflow DAG
  • Mechanical phase gates: entrypoint contract → manifest load → forum bind → preflight lint with autofix → ownership check → typed approval → worktree gate → dispatch → freshness-validated handoff → output validation
  • Coordinator (DAG) has no git access; orchestrator (separate session) does — separation is mechanical, not by convention
Result: 7 documented failure patterns from the prose-skill version eliminated mechanically. 4–6 Diane-hours of merge cleanup per sprint reclaimed.
Stack: Airflow 3.x, Python, Celery, cmux, Postgres + Redis, manifest.yaml + manifest.db
L1 · 61 deterministic probes $0 L2 · YAML repair recipes $0 L3 · Claude webhook $0.10 L4 · Telegram tap human L5 · weekly digest L1 L2 L3 L4 L5 Diane sits at L4 — escalation ends with human, doesn't start there
03 · LLM Cost Optimization
Airflow 5-Tier Watchdog
Deterministic L1–L2 absorb most incidents; LLM cost capped at $0.10
03 · LLM Cost Optimization
Airflow 5-Tier Watchdog
Problem: A 24/7 monitoring system can't afford an LLM call in every hot path. At 1 incident per 10 min × $0.50/incident, that's $72/day on a system that lives indefinitely.
Why it matters: Most production events are deterministic. Reserve LLM cost for events that genuinely need reasoning. Tier the watchdog so the cheap layers fail-fast and the expensive layer is the exception, not the default.
What I built:
  • L1 probes — 24/7 deterministic checks ($0/incident, 61-probe catalog)
  • L2 recipes — predefined repair scripts ($0/incident, YAML-keyed)
  • L3 webhook — Claude routine for the L2 misses ($0.05–0.20/incident)
  • L4 Telegram tap — human approval for proposed patches
  • L5 weekly digest — batched unresolved incidents
Result: 12 deliverables (D1–D12) shipped + merged 2026-05-01. LLM cost capped at single-digit dollars per week. Diane sits at L4, not L1 — escalation flow ends with the human, doesn't start with them.
Stack: Python (health_check.py, health_fix.py), YAML config, Anthropic Claude routine, Telegram Bot API, cron + plist
child audio MFCC + cosine 15.4KB · on-device 97% accuracy 66 phonemes waveform in classify <100ms android
04 · On-Device ML
Phoneme Classifier — 97% accuracy on-device
15.4KB MFCC model · 66 phonemes · no cloud · <100ms Android
04 · On-Device ML
Phoneme Classifier — 97% accuracy on-device
Problem: Speech recognition for 4-year-olds, on-device, on a phone. Without sending audio of children to any cloud.
Why it matters: Cloud calls would mean COPPA compliance overhead, latency, and ongoing inference cost. Most speech models are MB-to-GB. The constraint: fit accurate phoneme recognition in tens of KB that runs on a low-end Android in <100ms.
What I built:
  • Two-track classifier — Track A (MFCC + cosine, 15.4KB asset), Track B (WavLM fine-tune, ONNX export)
  • VTLN factor 1.104 derived from child F0 mean (269 Hz) — adult-trained features warped to child vocal tract
  • Sander 1972 substitution table for age-gated developmental allowances
  • 4-gate noise rejector for bedroom-recording realism
  • Confusion matrix audit, 120 unit tests across 9 files
Result: 97.0% accuracy on full 66-phoneme test set (only 2 phonetically-identical confusions). 100% on Diane's voice baseline. Production-shipping in L2R Android app.
Stack: Python (librosa, scikit-learn, ONNX, PyTorch for Track B), Kotlin (custom PhonemeClassifier on Android)
human video Character Scaler per-animal manifest scale + emit Rive output ~1200 MCP calls/animal 9 gestures · auto-derived
05 · End-to-End ML Pipeline
Pose Extraction → Find the Sound
Human video → 2D pose → CharacterScaler → Rive animal animation
05 · End-to-End ML Pipeline
Pose Extraction → Find the Sound
Problem: Animations for a kids' phonics app need expressive face/body params for 10 animals. No animator on staff. Hand-keyframing 10 animals × 9 gestures = months I don't have.
Why it matters: Product surface depends on animation fidelity — kids respond to expressive characters. The pipeline has to take human-recorded reference video as input and produce Rive-ready params as output, automatically, per-animal.
What I built:
  • End-to-end pipeline: human video → 2D pose → CharacterScaler per-animal → Rive runtime params
  • Video-as-floor, MoCap-as-ceiling: Diane's verified videos set magnitude floor; BABEL/AMASS MoCap can add but not override
  • Per-animal scaling from manifest (not hardcoded) — adding the 10th animal requires zero code changes
  • Ships keyframes via Rive MCP (~1200 calls per animal); post-wire structural eval
Result: Production-shipping in L2R V8 release (Play Store gate passed). 5 base videos × multiple animals × 9 gestures, all auto-derived. Reproducible via one batch command.
Stack: Python (RTMLib 2D landmarks, BABEL/AMASS, savgol smoothing, RDP), Rive runtime (Kotlin Android), JSON manifests, MCP-driven .riv emission