Skip to main content

Showing 1–4 of 4 results for author: Lonergan, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2501.00509  [pdf, other

    cs.CL cs.SD eess.AS

    Fotheidil: an Automatic Transcription System for the Irish Language

    Authors: Liam Lonergan, Ibon Saratxaga, John Sloan, Oscar Maharog, Mengjie Qian, Neasa Ní Chiaráin, Christer Gobl, Ailbhe Ní Chasaide

    Abstract: This paper sets out the first web-based transcription system for the Irish language - Fotheidil, a system that utilises speech-related AI technologies as part of the ABAIR initiative. The system includes both off-the-shelf pre-trained voice activity detection and speaker diarisation models and models trained specifically for Irish automatic speech recognition and capitalisation and punctuation res… ▽ More

    Submitted 31 December, 2024; originally announced January 2025.

    Comments: Accepted to the 5th Celtic Language Technology Workshop within COLING 2025

  2. arXiv:2405.01293  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    Low-resource speech recognition and dialect identification of Irish in a multi-task framework

    Authors: Liam Lonergan, Mengjie Qian, Neasa Ní Chiaráin, Christer Gobl, Ailbhe Ní Chasaide

    Abstract: This paper explores the use of Hybrid CTC/Attention encoder-decoder models trained with Intermediate CTC (InterCTC) for Irish (Gaelic) low-resource speech recognition (ASR) and dialect identification (DID). Results are compared to the current best performing models trained for ASR (TDNN-HMM) and DID (ECAPA-TDNN). An optimal InterCTC setting is initially established using a Conformer encoder. This… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: 7 pages. Accepted to Odyssey 2024 - The Speaker and Language Recognition Workshop

  3. arXiv:2307.07436  [pdf

    cs.CL cs.SD eess.AS

    Towards spoken dialect identification of Irish

    Authors: Liam Lonergan, Mengjie Qian, Neasa Ní Chiaráin, Christer Gobl, Ailbhe Ní Chasaide

    Abstract: The Irish language is rich in its diversity of dialects and accents. This compounds the difficulty of creating a speech recognition system for the low-resource language, as such a system must contend with a high degree of variability with limited corpora. A recent study investigating dialect bias in Irish ASR found that balanced training corpora gave rise to unequal dialect performance, with perfo… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Accepted to Interspeech 2023 Workshop of the 2nd Annual Meeting of the Special Interest Group of Under-resourced Languages Workshop, Dublin (SiGUL)

  4. arXiv:2307.07295  [pdf

    cs.CL cs.SD eess.AS

    Towards dialect-inclusive recognition in a low-resource language: are balanced corpora the answer?

    Authors: Liam Lonergan, Mengjie Qian, Neasa Ní Chiaráin, Christer Gobl, Ailbhe Ní Chasaide

    Abstract: ASR systems are generally built for the spoken 'standard', and their performance declines for non-standard dialects/varieties. This is a problem for a language like Irish, where there is no single spoken standard, but rather three major dialects: Ulster (Ul), Connacht (Co) and Munster (Mu). As a diagnostic to quantify the effect of the speaker's dialect on recognition performance, 12 ASR systems w… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: Accepted to Interspeech 2023, Dublin