Breaking the Barriers of Text-Hungry and Audio-Deficient AI
Authors:
Hamidou Tembine,
Issa Bamia,
Massa NDong,
Bakary Coulibaly,
Oumar Issiaka Traore,
Moussa Traore,
Moussa Sanogo,
Mamadou Eric Sangare,
Salif Kante,
Daryl Noupa Yongueng,
Hafiz Tiomoko Ali,
Malik Tiomoko,
Frejus Laleye,
Boualem Djehiche,
Wesmanegda Elisee Dipama,
Idris Baba Saje,
Hammid Mohammed Ibrahim,
Moumini Sanogo,
Marie Coursel Nininahazwe,
Abdul-Latif Siita,
Haine Mhlongo,
Teddy Nelvy Dieu Merci Kouka,
Mariam Serine Jeridi,
Mutiyamuogo Parfait Mupenge,
Lekoueiry Dehah
, et al. (9 additional authors not shown)
Abstract:
While global linguistic diversity spans more than 7164 recognized languages, the current dominant architecture of machine intelligence remains fundamentally biased toward written text. This bias excludes over 700 million people particularly in rural and remote regions who are audio-literate. In this work, we introduce a fully textless, audio-to-audio machine intelligence framework designed to serv…
▽ More
While global linguistic diversity spans more than 7164 recognized languages, the current dominant architecture of machine intelligence remains fundamentally biased toward written text. This bias excludes over 700 million people particularly in rural and remote regions who are audio-literate. In this work, we introduce a fully textless, audio-to-audio machine intelligence framework designed to serve this underserved population, and all the people who prefer audio-efficiency. Our contributions include novel Audio-to-Audio translation architectures that bypass text entirely, including spectrogram-, scalogram-, wavelet-, and unit-based models. Central to our approach is the Multiscale Audio-Semantic Transform (MAST), a representation that encodes tonal, prosodic, speaker, and expressive features. We further integrate MAST into a fractional diffusion of mean-field-type framework powered by fractional Brownian motion. It enables the generation of high-fidelity, semantically consistent speech without reliance on textual supervision. The result is a robust and scalable system capable of learning directly from raw audio, even in languages that are unwritten or rarely digitized. This work represents a fundamental shift toward audio-native machine intelligence systems, expanding access to language technologies for communities historically left out of the current machine intelligence ecosystem.
△ Less
Submitted 3 June, 2025;
originally announced June 2025.
Neural Machine Translation for Extremely Low-Resource African Languages: A Case Study on Bambara
Authors:
Allahsera Auguste Tapo,
Bakary Coulibaly,
Sébastien Diarra,
Christopher Homan,
Julia Kreutzer,
Sarah Luger,
Arthur Nagashima,
Marcos Zampieri,
Michael Leventhal
Abstract:
Low-resource languages present unique challenges to (neural) machine translation. We discuss the case of Bambara, a Mande language for which training data is scarce and requires significant amounts of pre-processing. More than the linguistic situation of Bambara itself, the socio-cultural context within which Bambara speakers live poses challenges for automated processing of this language. In this…
▽ More
Low-resource languages present unique challenges to (neural) machine translation. We discuss the case of Bambara, a Mande language for which training data is scarce and requires significant amounts of pre-processing. More than the linguistic situation of Bambara itself, the socio-cultural context within which Bambara speakers live poses challenges for automated processing of this language. In this paper, we present the first parallel data set for machine translation of Bambara into and from English and French and the first benchmark results on machine translation to and from Bambara. We discuss challenges in working with low-resource languages and propose strategies to cope with data scarcity in low-resource machine translation (MT).
△ Less
Submitted 10 November, 2020;
originally announced November 2020.