PhonemeClassifier on Android with MFCC extractor + cosine matcher)Two parallel classifier tracks were developed and evaluated. Track A is the production-shipping baseline; Track B is the research direction for harder phoneme distinctions.
| True / Predicted | ee | ea | tch | ch | (other 62) | Notes |
|---|---|---|---|---|---|---|
| ee | ✓ | → | — | — | — | "ee" sometimes classified as "ea" |
| ea | → | ✓ | — | — | — | "ea" sometimes classified as "ee" |
| tch | — | — | ✓ | → | — | "tch" sometimes classified as "ch" |
| ch | — | — | → | ✓ | — | "ch" sometimes classified as "tch" |
| (other 62) | — | — | — | — | ✓✓✓ | Perfect diagonal — 62 / 62 correct |
/iː/ for ee/ea, /tʃ/ for tch/ch. A human grader would not mark these wrong. The classifier disambiguates by spelling context in the L2R app, not by acoustic difference.
Adult speech corpora (LibriSpeech, Common Voice) don't represent child voices. Children have higher fundamental frequency, shorter vocal tracts, and different formant ratios. Models trained only on adult data systematically fail on kids.
| Adaptation | What it does | Impact |
|---|---|---|
| VTLN (Vocal Tract Length Normalization) | Frequency-warps adult-trained features to match child vocal tract. | Factor 1.104 derived from child F0 mean of 269 Hz. Brings child speech into adult feature space. |
| Sander 1972 substitution table | Age-gated phoneme substitution — accepts /w/ for /r/ at age 3, but not at age 6. |
Avoids penalizing developmentally-typical pronunciations. |
| 4-gate noise rejector | VAD + SNR + spectral flatness + duration gate before classification. | Rejects bedroom noise, sibling speech, breath noise without false positives on quiet voices. |
| Recording-on-device | Audio never leaves the phone. | COPPA-compliant by architecture: there's no parental consent dance for cloud transmission because there's no transmission. |
phoneme_classifier.py — MFCC + cosine, 100% accuracy on Diane's recordingsmanner_detector.py — manner class detectionvot_detector.py — voice onset timespectral_matcher.py — fricatives + vowelsconfusion_matrix.py — full matrix + HTML reportgenerate_reference_profiles.py — builds the 15.4KB assetchild_speech_test.py — VTLN validation on real child datarun_full_qa.sh — one-command full QA pipelinecom.readingpractice.audio)PhonemeFeatureExtractor — MFCC, spectral, ZCR, HF ratioPhonemeClassifier — cosine match against profiles, age-adjusted tiersAudioCaptureManager — mic capture with VADPhonemeFeedbackEngine — capture → features → classify → feedbackVTLNCalibrator — child speech normalizationPhonemeProgressTracker — mastery levels per phonemeDevelopmentalSubstitutions — Sander 1972 tableNoiseRejector — 4-gate noise rejectionThe classifier is only as good as the audio it's trained against. This is the ear-check tool I use to verify every one of the 66 sounds shipping in the L2R Android app — phoneme-by-phoneme audio playback, side-by-side comparison against older versions, approve/reject UI driven from a phone.
🔊 Open the Phoneme Audio QA Dashboard → 📊 Spectral feature walkthrough — /a/ vs /f/ →projects/learn_to_read/scripts/ (Python) + app/src/main/java/com/readingpractice/audio/ (Kotlin)projects/track_a_classical/projects/neural_phonics/projects/phonics_classifier/