← Back to Phoneme Classifier overview · Portfolio home →
PROBLEM: A speech model trained on adult corpora misclassifies a 4-year-old. Children have higher fundamental frequency, shorter vocal tracts, and different formant ratios. The model needs ground-truth child recordings — but COPPA-friendly child speech datasets don't exist for this use case.
WHY IT MATTERS: No anonymous internet "child speech" dataset is COPPA-safe to train on. The pragmatic answer: I record my own children, label every clip myself, and gate by parental consent (mine). This is the dataset that backs the VTLN normalization factor, the developmental substitution table, and the 97.4% across-manner accuracy.
WHAT YOU'RE LISTENING TO: Diane-verified phoneme recordings — adult prompt + child response, captured at age 2 and again at age 4. Each row is a letter; each column is one of the 4 voices. Diane labels which clips passed verification. F0 (pitch) is shown when measured.
STACK: iPhone Voice Memos for capture, librosa pyin for F0 extraction, JSON manifests for per-clip labels and prompt/response pairing

Child Voice Samples — 2yo and 4yo phoneme dataset

Diane-verified ground-truth recordings · adult prompt → child response · pitch-paired
"If you train a speech model on adult corpora and ship it to a 4-year-old, it fails badly — because a child's F0 is ~345 Hz versus an adult's ~226 Hz, and adult formants warp incorrectly. So I built the simplest possible ground-truth: my own kids, recorded over two years, every clip listened to and labeled. This is the dataset that grounds the VTLN factor and the developmental substitution table."
2yo adult prompts (Diane)
2yo child responses
4yo adult prompts (Diane)
4yo child responses

The 4 voices, side by side

2yo adult — Diane prompts a 2-year-old 2yo child — 2-year-old's response 4yo adult — Diane prompts a 4-year-old 4yo child — 4-year-old's response
Letter
2yo · adult
2yo · child
4yo · adult
4yo · child

What this dataset enabled