Diane Doten
Staff Data Scientist / ML
PhD Berkeley 10yr industry 15yr data Minneapolis, MN
Machine learning engineer and data science leader. I architect, build, and deploy scalable ML systems from the ground up — on-device models, multi-agent orchestration, and data products that integrate directly into stakeholder workflows. The five systems below were built solo in 2–4 hour daily windows. Most are not on my résumé, but they're where my recent hours have gone.

A solo-built phonics app for ages 2–6. I own the full stack: on-device ML (phoneme classifier), character animation (gesture pipeline → Rive runtime), and the Android app itself.

View the app →
human video Character Scaler per-animal manifest scale + emit Rive output ~1200 MCP calls/animal 9 gestures · auto-derived
01 · Character Animation · Rive
Teaching Animals to Gesture
From hand-tuning offsets in Rive to a data-driven, config-driven pipeline that turns my own gesture videos into fully-rigged animal characters. Three layers: pose-based extraction · manifest-driven wiring · validators as testable contracts.
6 animals shipped (10 planned)
9 gesture states each
~30 params / gesture from video
0 code changes per new animal
01 · Character Animation · Rive
Teaching Animals to Gesture
Problem: Kids' app needs expressive gestures for 10 animals × 9 states. No animator. Hand-tuning offsets per animal doesn't scale, and the magnitudes felt arbitrary — numbers I picked, not numbers the body actually does.
Why it matters: Three problems stack: where do the magnitudes come from, how does the pipeline scale to a 10th animal, and how do you know what shipped is any good?
  • Data-driven: RTMLib 133-keypoint pose extraction from my own gesture videos → 95th-percentile delta from a universal rest pose → ~30 measured parameters per gesture in gesture_schema_v3.json
  • Config-driven: per-animal manifest declares the contract (artboard, ViewModel, group map, viseme inventory). Wiring step iterates the schema dynamically — no gesture name is hardcoded. Kotlin reads the same state-machine names
  • Validators as contracts: 4-dimension score (Spec / Silhouette / Distinctiveness / Safety) + 13 geometry-and-motion validators. The eval landed first; parameter sweeps are the next loop
Result: Ships in L2R V8 (Play Store). Adding a new gesture is a recording. Adding a new animal is a manifest.
Stack: Python (RTMLib, NumPy, savgol smoothing, pytest validators), Rive runtime (Kotlin Android), JSON manifests, Rive scripting for keyframe emission
Open full pipeline →
child audio MFCC + cosine 15.4KB · on-device 97% accuracy 66 phonemes waveform in classify <100ms android
02 · On-Device ML · Android
97% Phonics Accuracy, No Cloud
On-device phoneme classifier for 4-year-olds. No cloud, no COPPA overhead, 15.4KB asset that runs in <100ms on a low-end Android — grounded in child-voice acoustics research.
97% accuracy, 66 phonemes
15.4KB model asset size
<100ms on-device latency
0 cloud calls
02 · On-Device ML · Android
97% Phonics Accuracy, No Cloud
Problem: Speech recognition for 4-year-olds, on-device. No audio sent to any cloud.
Why it matters: Cloud = COPPA overhead + latency + cost. Constraint: 15KB, <100ms on a low-end Android.
  • Hierarchical classifier — 5-way manner first, then phoneme
  • VTLN factor 1.104 from child F0 mean (269 Hz) — Lee, Potamianos & Narayanan 1999
  • Sander 1972 substitution table — /t/→/k/ at age 2-3 is development, not error
  • 4-gate noise rejector for bedroom-recording realism
Result: 97% on full test set. Ships in production L2R app.
Stack: Python (librosa, scikit-learn, ONNX), Kotlin (PhonemeClassifier on Android)
Full classifier breakdown →

A daily FX forecasting and paper-trading system. Three-model ensemble, walk-forward validated, drift-triggered re-tuning.

JPY/USD · 30d forecast SELL target ADX 21.4 · trending_up Sharpe 2.77
06 · Forecasting · Time Series · MLOps
FX Forecasting on a Laptop
Daily ensemble forecaster (SARIMA + Prophet + LightGBM) for currency pairs. Walk-forward validated. Drift-triggered re-tuning. Two pairs live in paper trading.
Python statsmodels Prophet LightGBM Optuna SQLite Streamlit LaunchAgent
06 · Forecasting · Time Series · MLOps
FX Forecasting on a Laptop
Problem: Build an FX forecaster and paper-trade pipeline that runs unattended on a laptop with no cloud budget.
  • 3-model ensemble — SARIMA + Prophet + LightGBM, horizon-adaptive weights
  • Optuna tuning cut MAPE 60-80% across all pairs
  • ADX regime gate took JPY Sharpe 1.18 → 1.49 (+26%)
  • Drift-triggered re-tune (1.5× × 2 weeks) — 71% less compute
  • Daily LaunchAgent + weekly retrain + Streamlit dashboard
Stack: Python (statsmodels, Prophet, LightGBM, Optuna), SQLite, Streamlit, macOS LaunchAgent, Slack webhook
Open the deep-dive →

Agent message routing, multi-agent orchestration via Airflow DAG, and a cost-tiered monitoring watchdog.

9-class classifier VP Systems VP ML VP Mobile 8 VP agents classify → route
03 · LLM Context Engineering
Routing 500 Agent Messages/Day
A 9-class message classifier that routes ~500 daily agent messages to the right VP automatically, with per-session token cost attribution.
~500 messages/day
9 classifier classes
8 VP agents routed
03 · LLM Context Engineering
Routing 500 Agent Messages/Day
Problem: ~500 unstructured messages/day across 8 VP agents. Without routing, every message is a context-switch tax.
  • 9-class message classifier with routing rules engine
  • Post-classification flow dashboard — system map of where threads go
  • Before/after token-cost timeline per VP per sprint phase
Result: Per-session cost attributable per VP, per phase. Routing replaces inbox triage.
Stack: Python (classifier), SQLite FTS5, Flask API
Open routing dashboard →
10 mechanical phase gates entry contract manifest + forum preflight lint+fix worktree gate dispatch +validate ownership approval handoff freshness output QA coordinator has no git access — enforcement is mechanical 7 failure patterns eliminated
04 · Multi-Agent Orchestration
7 Agent Failures, Eliminated
A 10-phase prose skill converted to an Airflow DAG. LLM removed from orchestration decisions. 7 documented failure patterns eliminated mechanically.
7 failure patterns gone
10 mechanical phase gates
4–6hr per sprint reclaimed
04 · Multi-Agent Orchestration
7 Agent Failures, Eliminated
Problem: Prose instructions are not enforcement. LLMs improvise under pressure. 7 failure patterns proved it.
  • 10-phase skill codified into wave_run_v2 Airflow DAG
  • Phase gates: entry contract → manifest → lint+autofix → ownership → worktree gate → dispatch → handoff sensor → output QA
  • Coordinator (DAG) has no git access — separation is mechanical
Result: 7 failure patterns eliminated. 4–6 Diane-hours of merge cleanup per sprint reclaimed.
Stack: Airflow 3.x, Python, Celery, Postgres + Redis
DAG walkthrough →
L1 · 61 deterministic probes $0 L2 · YAML repair recipes $0 L3 · Claude webhook $0.10 L4 · Telegram tap human L5 · weekly digest L1 L2 L3 L4 L5 Diane sits at L4 — escalation ends with human, doesn't start there
05 · LLM Cost Optimization
$0 for Most Incidents
A 5-tier escalation watchdog where L1–L2 handle most incidents deterministically at $0. LLM only invoked when they fail. Human at L4, not L1.
61 deterministic probes
$0.10 max LLM cost / incident
12 deliverables shipped
05 · LLM Cost Optimization
$0 for Most Incidents
Problem: 24/7 monitoring can't afford an LLM in every hot path. At 1 incident/10 min × $0.50 = $72/day indefinitely.
  • L1: 61-probe deterministic checks — $0/incident, 24/7
  • L2: YAML repair recipes keyed by probe ID — $0/incident
  • L3: Claude webhook for L2 misses — $0.05–0.20/incident
  • L4: Telegram tap-to-approve — human time only
Result: 12 deliverables merged 2026-05-01. LLM capped at single-digit dollars/week.
Stack: Python (health_check.py, health_fix.py), YAML, Anthropic Claude, Telegram Bot API
Open the watchdog →