05 · End-to-End ML Pipeline
Pose Extraction → Find the Sound
Problem: Animations for a kids' phonics app need expressive face/body params for 10 animals. No animator on staff. Hand-keyframing 10 animals × 9 gestures = months I don't have.
Why it matters: Product surface depends on animation fidelity — kids respond to expressive characters. The pipeline has to take human-recorded reference video as input and produce Rive-ready params as output, automatically, per-animal.
What I built:
- End-to-end pipeline: human video → 2D pose → CharacterScaler per-animal → Rive runtime params
- Video-as-floor, MoCap-as-ceiling: Diane's verified videos set magnitude floor; BABEL/AMASS MoCap can add but not override
- Per-animal scaling from manifest (not hardcoded) — adding the 10th animal requires zero code changes
- Ships keyframes via Rive MCP (~1200 calls per animal); post-wire structural eval
Result: Production-shipping in L2R V8 release (Play Store gate passed). 5 base videos × multiple animals × 9 gestures, all auto-derived. Reproducible via one batch command.
Stack: Python (RTMLib 2D landmarks, BABEL/AMASS, savgol smoothing, RDP), Rive runtime (Kotlin Android), JSON manifests, MCP-driven .riv emission