Skip to main content

Showing 1–1 of 1 results for author: Charlot, T

Searching in archive eess. Search in all archives.
.
  1. arXiv:2509.15001  [pdf, ps, other

    eess.AS cs.LG cs.SD

    BabyHuBERT: Multilingual Self-Supervised Learning for Segmenting Speakers in Child-Centered Long-Form Recordings

    Authors: Théo Charlot, Tarek Kunze, Maxime Poli, Alejandrina Cristia, Emmanuel Dupoux, Marvin Lavechin

    Abstract: Child-centered long-form recordings are essential for studying early language development, but existing speech models trained on clean adult data perform poorly due to acoustic and linguistic differences. We introduce BabyHuBERT, the first self-supervised speech representation model trained on 13,000 hours of multilingual child-centered long-form recordings spanning over 40 languages. We evaluate… ▽ More

    Submitted 18 September, 2025; originally announced September 2025.

    Comments: 5 pages, 1 figure