Diane Doten — Recent Work

Diane Doten

Staff Data Scientist / ML

PhD Berkeley 10yr industry 15yr data Minneapolis, MN

Machine learning engineer and data science leader. I architect, build, and deploy scalable ML systems from the ground up — on-device models, multi-agent orchestration, and data products that integrate directly into stakeholder workflows. The five systems below were built solo in 2–4 hour daily windows. Most are not on my résumé, but they're where my recent hours have gone.

dianedoten.carrd.co LinkedIn dianedoten@gmail.com

— Learn to Read · iOS / Android Phonics App

A solo-built phonics app for ages 2–6. I own the full stack: on-device ML (phoneme classifier), character animation (gesture pipeline → Rive runtime), and the Android app itself.

View the app →

01 · Character Animation · Rive

Teaching Animals to Gesture

From hand-tuning offsets in Rive to a data-driven, config-driven pipeline that turns my own gesture videos into fully-rigged animal characters. Three layers: pose-based extraction · manifest-driven wiring · validators as testable contracts.

6 animals shipped (10 planned)

9 gesture states each

~30 params / gesture from video

0 code changes per new animal

01 · Character Animation · Rive

Teaching Animals to Gesture

Problem: Kids' app needs expressive gestures for 10 animals × 9 states. No animator. Hand-tuning offsets per animal doesn't scale, and the magnitudes felt arbitrary — numbers I picked, not numbers the body actually does.

Why it matters: Three problems stack: where do the magnitudes come from, how does the pipeline scale to a 10th animal, and how do you know what shipped is any good?

Data-driven: RTMLib 133-keypoint pose extraction from my own gesture videos → 95th-percentile delta from a universal rest pose → ~30 measured parameters per gesture in gesture_schema_v3.json
Config-driven: per-animal manifest declares the contract (artboard, ViewModel, group map, viseme inventory). Wiring step iterates the schema dynamically — no gesture name is hardcoded. Kotlin reads the same state-machine names
Validators as contracts: 4-dimension score (Spec / Silhouette / Distinctiveness / Safety) + 13 geometry-and-motion validators. The eval landed first; parameter sweeps are the next loop

Result: Ships in L2R V8 (Play Store). Adding a new gesture is a recording. Adding a new animal is a manifest.

Stack: Python (RTMLib, NumPy, savgol smoothing, pytest validators), Rive runtime (Kotlin Android), JSON manifests, Rive scripting for keyframe emission

Open full pipeline →

02 · On-Device ML · Android

97% Phonics Accuracy, No Cloud

On-device phoneme classifier for 4-year-olds. No cloud, no COPPA overhead, 15.4KB asset that runs in <100ms on a low-end Android — grounded in child-voice acoustics research.

97% accuracy, 66 phonemes

15.4KB model asset size

<100ms on-device latency

02 · On-Device ML · Android

97% Phonics Accuracy, No Cloud

Problem: Speech recognition for 4-year-olds, on-device. No audio sent to any cloud.

Why it matters: Cloud = COPPA overhead + latency + cost. Constraint: 15KB, <100ms on a low-end Android.

Hierarchical classifier — 5-way manner first, then phoneme
VTLN factor 1.104 from child F0 mean (269 Hz) — Lee, Potamianos & Narayanan 1999
Sander 1972 substitution table — /t/→/k/ at age 2-3 is development, not error
4-gate noise rejector for bedroom-recording realism

Result: 97% on full test set. Ships in production L2R app.

Stack: Python (librosa, scikit-learn, ONNX), Kotlin (PhonemeClassifier on Android)

Full classifier breakdown →

— Forecasting & Time Series

A daily FX forecasting and paper-trading system. Three-model ensemble, walk-forward validated, drift-triggered re-tuning.

06 · Forecasting · Time Series · MLOps

FX Forecasting on a Laptop

Daily ensemble forecaster (SARIMA + Prophet + LightGBM) for currency pairs. Walk-forward validated. Drift-triggered re-tuning. Two pairs live in paper trading.

Python statsmodels Prophet LightGBM Optuna SQLite Streamlit LaunchAgent

06 · Forecasting · Time Series · MLOps

FX Forecasting on a Laptop

Problem: Build an FX forecaster and paper-trade pipeline that runs unattended on a laptop with no cloud budget.

3-model ensemble — SARIMA + Prophet + LightGBM, horizon-adaptive weights
Optuna tuning cut MAPE 60-80% across all pairs
ADX regime gate took JPY Sharpe 1.18 → 1.49 (+26%)
Drift-triggered re-tune (1.5× × 2 weeks) — 71% less compute
Daily LaunchAgent + weekly retrain + Streamlit dashboard

Stack: Python (statsmodels, Prophet, LightGBM, Optuna), SQLite, Streamlit, macOS LaunchAgent, Slack webhook

Open the deep-dive →

— Systems & Infrastructure

Agent message routing, multi-agent orchestration via Airflow DAG, and a cost-tiered monitoring watchdog.

03 · LLM Context Engineering

Routing 500 Agent Messages/Day

A 9-class message classifier that routes ~500 daily agent messages to the right VP automatically, with per-session token cost attribution.

~500 messages/day

9 classifier classes

8 VP agents routed

03 · LLM Context Engineering

Routing 500 Agent Messages/Day

Problem: ~500 unstructured messages/day across 8 VP agents. Without routing, every message is a context-switch tax.

9-class message classifier with routing rules engine
Post-classification flow dashboard — system map of where threads go
Before/after token-cost timeline per VP per sprint phase

Result: Per-session cost attributable per VP, per phase. Routing replaces inbox triage.

Stack: Python (classifier), SQLite FTS5, Flask API

Open routing dashboard →

04 · Multi-Agent Orchestration

7 Agent Failures, Eliminated

A 10-phase prose skill converted to an Airflow DAG. LLM removed from orchestration decisions. 7 documented failure patterns eliminated mechanically.

7 failure patterns gone

10 mechanical phase gates

4–6hr per sprint reclaimed

04 · Multi-Agent Orchestration

7 Agent Failures, Eliminated

Problem: Prose instructions are not enforcement. LLMs improvise under pressure. 7 failure patterns proved it.

10-phase skill codified into wave_run_v2 Airflow DAG
Phase gates: entry contract → manifest → lint+autofix → ownership → worktree gate → dispatch → handoff sensor → output QA
Coordinator (DAG) has no git access — separation is mechanical

Result: 7 failure patterns eliminated. 4–6 Diane-hours of merge cleanup per sprint reclaimed.

Stack: Airflow 3.x, Python, Celery, Postgres + Redis

DAG walkthrough →

05 · LLM Cost Optimization

$0 for Most Incidents

A 5-tier escalation watchdog where L1–L2 handle most incidents deterministically at $0. LLM only invoked when they fail. Human at L4, not L1.

61 deterministic probes

$0.10 max LLM cost / incident

12 deliverables shipped

05 · LLM Cost Optimization

$0 for Most Incidents

Problem: 24/7 monitoring can't afford an LLM in every hot path. At 1 incident/10 min × $0.50 = $72/day indefinitely.

L1: 61-probe deterministic checks — $0/incident, 24/7
L2: YAML repair recipes keyed by probe ID — $0/incident
L3: Claude webhook for L2 misses — $0.05–0.20/incident
L4: Telegram tap-to-approve — human time only

Result: 12 deliverables merged 2026-05-01. LLM capped at single-digit dollars/week.

Stack: Python (health_check.py, health_fix.py), YAML, Anthropic Claude, Telegram Bot API

Open the watchdog →