-
Automatic Speech Recognition for African Low-Resource Languages: Challenges and Future Directions
Authors:
Sukairaj Hafiz Imam,
Babangida Sani,
Dawit Ketema Gete,
Bedru Yimam Ahamed,
Ibrahim Said Ahmad,
Idris Abdulmumin,
Seid Muhie Yimam,
Muhammad Yahuza Bello,
Shamsuddeen Hassan Muhammad
Abstract:
Automatic Speech Recognition (ASR) technologies have transformed human-computer interaction; however, low-resource languages in Africa remain significantly underrepresented in both research and practical applications. This study investigates the major challenges hindering the development of ASR systems for these languages, which include data scarcity, linguistic complexity, limited computational r…
▽ More
Automatic Speech Recognition (ASR) technologies have transformed human-computer interaction; however, low-resource languages in Africa remain significantly underrepresented in both research and practical applications. This study investigates the major challenges hindering the development of ASR systems for these languages, which include data scarcity, linguistic complexity, limited computational resources, acoustic variability, and ethical concerns surrounding bias and privacy. The primary goal is to critically analyze these barriers and identify practical, inclusive strategies to advance ASR technologies within the African context. Recent advances and case studies emphasize promising strategies such as community-driven data collection, self-supervised and multilingual learning, lightweight model architectures, and techniques that prioritize privacy. Evidence from pilot projects involving various African languages showcases the feasibility and impact of customized solutions, which encompass morpheme-based modeling and domain-specific ASR applications in sectors like healthcare and education. The findings highlight the importance of interdisciplinary collaboration and sustained investment to tackle the distinct linguistic and infrastructural challenges faced by the continent. This study offers a progressive roadmap for creating ethical, efficient, and inclusive ASR systems that not only safeguard linguistic diversity but also improve digital accessibility and promote socioeconomic participation for speakers of African languages.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Whispering in Amharic: Fine-tuning Whisper for Low-resource Language
Authors:
Dawit Ketema Gete,
Bedru Yimam Ahmed,
Tadesse Destaw Belay,
Yohannes Ayana Ejigu,
Sukairaj Hafiz Imam,
Alemu Belay Tessema,
Mohammed Oumer Adem,
Tadesse Amare Belay,
Robert Geislinger,
Umma Aliyu Musa,
Martin Semmann,
Shamsuddeen Hassan Muhammad,
Henning Schreiber,
Seid Muhie Yimam
Abstract:
This work explores fine-tuning OpenAI's Whisper automatic speech recognition (ASR) model for Amharic, a low-resource language, to improve transcription accuracy. While the foundational Whisper model struggles with Amharic due to limited representation in its training data, we fine-tune it using datasets like Mozilla Common Voice, FLEURS, and the BDU-speech dataset. The best-performing model, Whisp…
▽ More
This work explores fine-tuning OpenAI's Whisper automatic speech recognition (ASR) model for Amharic, a low-resource language, to improve transcription accuracy. While the foundational Whisper model struggles with Amharic due to limited representation in its training data, we fine-tune it using datasets like Mozilla Common Voice, FLEURS, and the BDU-speech dataset. The best-performing model, Whispersmall-am, significantly improves when finetuned on a mix of existing FLEURS data and new, unseen Amharic datasets. Training solely on new data leads to poor performance, but combining it with FLEURS data reinforces the model, enabling better specialization in Amharic. We also demonstrate that normalizing Amharic homophones significantly enhances Word Error Rate (WER) and Bilingual Evaluation Understudy (BLEU) scores. This study underscores the importance of fine-tuning strategies and dataset composition for improving ASR in low-resource languages, providing insights for future Amharic speech recognition research.
△ Less
Submitted 28 March, 2025; v1 submitted 24 March, 2025;
originally announced March 2025.
-
Enhancing Multi-Label Emotion Analysis and Corresponding Intensities for Ethiopian Languages
Authors:
Tadesse Destaw Belay,
Dawit Ketema Gete,
Abinew Ali Ayele,
Olga Kolesnikova,
Grigori Sidorov,
Seid Muhie Yimam
Abstract:
In this digital world, people freely express their emotions using different social media platforms. As a result, modeling and integrating emotion-understanding models are vital for various human-computer interaction tasks such as decision-making, product and customer feedback analysis, political promotions, marketing research, and social media monitoring. As users express different emotions simult…
▽ More
In this digital world, people freely express their emotions using different social media platforms. As a result, modeling and integrating emotion-understanding models are vital for various human-computer interaction tasks such as decision-making, product and customer feedback analysis, political promotions, marketing research, and social media monitoring. As users express different emotions simultaneously in a single instance, annotating emotions in a multilabel setting such as the EthioEmo (Belay et al., 2025) dataset effectively captures this dynamic. Additionally, incorporating intensity, or the degree of emotion, is crucial, as emotions can significantly differ in their expressive strength and impact. This intensity is significant for assessing whether further action is necessary in decision-making processes, especially concerning negative emotions in applications such as healthcare and mental health studies. To enhance the EthioEmo dataset, we include annotations for the intensity of each labeled emotion. Furthermore, we evaluate various state-of-the-art encoder-only Pretrained Language Models (PLMs) and decoder-only Large Language Models (LLMs) to provide comprehensive benchmarking.
△ Less
Submitted 23 March, 2025;
originally announced March 2025.