Diane Doten

Staff Data Scientist / ML — building LLM-optimized systems
Five systems below. Each was built solo over the past several months in 2–4 hour daily windows. The container itself is a sample — it's a static dashboard linking to deeper dashboards, every page hand-built. Most of this work is not on my résumé, but it's where my recent hours have gone: LLM cost optimization, multi-agent orchestration, on-device ML, end-to-end pipelines, and dashboards as the operational surface.
— Recent work · 5 systems
01 · LLM Cost / Context Engineering
Thread Routing & Context Optimization
Problem: A multi-agent system produced hundreds of unstructured messages a day across 8 VP agents — review time became the bottleneck.
Why it matters: Without classification + routing, every message is a context-switch tax. With it, only the right thread reaches the right person, and per-session token cost becomes attributable.
What I built:
Result: Per-session token cost made visible and attributable per VP, per sprint phase. Routing replaces inbox-style triage — only the relevant thread reaches the right person.
Stack: Python (classifier), SQLite (forum.db with FTS5), Flask (localhost:5556 API), HTML/CSS/JS
02 · Multi-Agent Orchestration
Wave Run v2 — Prose Skill → Airflow DAG
Problem: A 10-phase prose skill for running parallel LLM agents produced 7 distinct, repeated failure patterns. Cost: 4–6 hours of merge cleanup per sprint.
Why it matters: Prose instructions are not enforcement. An LLM with all the orchestration info in context will improvise under pressure. The fix isn't clearer prose — it's removing the LLM from the orchestration decision entirely.
What I built:
Result: 7 documented failure patterns from the prose-skill version eliminated mechanically. 4–6 Diane-hours of merge cleanup per sprint reclaimed.
Stack: Airflow 3.x, Python, Celery, cmux, Postgres + Redis, manifest.yaml + manifest.db
03 · LLM Cost Optimization
Airflow 5-Tier Watchdog
Problem: A 24/7 monitoring system can't afford an LLM call in every hot path. At 1 incident per 10 min × $0.50/incident, that's $72/day on a system that lives indefinitely.
Why it matters: Most production events are deterministic. Reserve LLM cost for events that genuinely need reasoning. Tier the watchdog so the cheap layers fail-fast and the expensive layer is the exception, not the default.
What I built:
Result: 12 deliverables (D1–D12) shipped + merged 2026-05-01. LLM cost capped at single-digit dollars per week. Diane sits at L4, not L1 — escalation flow ends with the human, doesn't start with them.
Stack: Python (health_check.py, health_fix.py), YAML config, Anthropic Claude routine, Telegram Bot API, cron + plist
04 · On-Device ML
Phoneme Classifier — 97% accuracy on-device
Problem: Speech recognition for 4-year-olds, on-device, on a phone. Without sending audio of children to any cloud.
Why it matters: Cloud calls would mean COPPA compliance overhead, latency, and ongoing inference cost. Most speech models are MB-to-GB. The constraint: fit accurate phoneme recognition in tens of KB that runs on a low-end Android in <100ms.
What I built:
Result: 97.0% accuracy on full 66-phoneme test set (only 2 phonetically-identical confusions). 100% on Diane's voice baseline. Production-shipping in L2R Android app.
Stack: Python (librosa, scikit-learn, ONNX, PyTorch for Track B), Kotlin (custom PhonemeClassifier on Android)
05 · End-to-End ML Pipeline
Pose Extraction → Find the Sound
Problem: Animations for a kids' phonics app need expressive face/body params for 10 animals. No animator on staff. Hand-keyframing 10 animals × 9 gestures = months I don't have.
Why it matters: Product surface depends on animation fidelity — kids respond to expressive characters. The pipeline has to take human-recorded reference video as input and produce Rive-ready params as output, automatically, per-animal.
What I built:
Result: Production-shipping in L2R V8 release (Play Store gate passed). 5 base videos × multiple animals × 9 gestures, all auto-derived. Reproducible via one batch command.
Stack: Python (RTMLib 2D landmarks, BABEL/AMASS, savgol smoothing, RDP), Rive runtime (Kotlin Android), JSON manifests, MCP-driven .riv emission