Skip to main content

Showing 1–13 of 13 results for author: Knill, K M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.18532  [pdf, ps, other

    cs.CL cs.LG cs.SD eess.AS

    End-to-End Spoken Grammatical Error Correction

    Authors: Mengjie Qian, Rao Ma, Stefano Bannò, Mark J. F. Gales, Kate M. Knill

    Abstract: Grammatical Error Correction (GEC) and feedback play a vital role in supporting second language (L2) learners, educators, and examiners. While written GEC is well-established, spoken GEC (SGEC), aiming to provide feedback based on learners' speech, poses additional challenges due to disfluencies, transcription errors, and the lack of structured input. SGEC systems typically follow a cascaded pipel… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: This work has been submitted to the IEEE for possible publication

  2. arXiv:2505.21148  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Assessment of L2 Oral Proficiency using Speech Large Language Models

    Authors: Rao Ma, Mengjie Qian, Siyuan Tang, Stefano Bannò, Kate M. Knill, Mark J. F. Gales

    Abstract: The growing population of L2 English speakers has increased the demand for developing automatic graders for spoken language assessment (SLA). Historically, statistical models, text encoders, and self-supervised speech models have been utilised for this task. However, cascaded systems suffer from the loss of information, while E2E graders also have limitations. With the recent advancements of multi… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: submitted to Interspeech

  3. arXiv:2505.21137  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Scaling and Prompting for Improved End-to-End Spoken Grammatical Error Correction

    Authors: Mengjie Qian, Rao Ma, Stefano Bannò, Kate M. Knill, Mark J. F. Gales

    Abstract: Spoken Grammatical Error Correction (SGEC) and Feedback (SGECF) are crucial for second language learners, teachers and test takers. Traditional SGEC systems rely on a cascaded pipeline consisting of an ASR, a module for disfluency detection (DD) and removal and one for GEC. With the rise of end-to-end (E2E) speech foundation models, we investigate their effectiveness in SGEC and feedback generatio… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: submitted to Interspeech

  4. arXiv:2505.20529  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Training Articulatory Inversion Models for Interspeaker Consistency

    Authors: Charles McGhee, Mark J. F. Gales, Kate M. Knill

    Abstract: Acoustic-to-Articulatory Inversion (AAI) attempts to model the inverse mapping from speech to articulation. Exact articulatory prediction from speech alone may be impossible, as speakers can choose different forms of articulation seemingly without reference to their vocal tract structure. However, once a speaker has selected an articulatory form, their productions vary minimally. Recent works in A… ▽ More

    Submitted 9 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  5. arXiv:2407.06800  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Learn and Don't Forget: Adding a New Language to ASR Foundation Models

    Authors: Mengjie Qian, Siyuan Tang, Rao Ma, Kate M. Knill, Mark J. F. Gales

    Abstract: Foundation ASR models often support many languages, e.g. 100 languages in Whisper. However, there has been limited work on integrating an additional, typically low-resource, language, while maintaining performance on the original language set. Fine-tuning, while simple, may degrade the accuracy of the original set. We compare three approaches that exploit adaptation parameters: soft language code… ▽ More

    Submitted 24 September, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Proceedings of Interspeech

  6. arXiv:2404.18557  [pdf, other

    cs.CL

    Can GPT-4 do L2 analytic assessment?

    Authors: Stefano Bannò, Hari Krishna Vydana, Kate M. Knill, Mark J. F. Gales

    Abstract: Automated essay scoring (AES) to evaluate second language (L2) proficiency has been a firmly established technology used in educational contexts for decades. Although holistic scoring has seen advancements in AES that match or even exceed human performance, analytic scoring still encounters issues as it inherits flaws and shortcomings from the human scoring process. The recent introduction of larg… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted for the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

  7. arXiv:2311.09363  [pdf, other

    cs.CL

    Investigating the Emergent Audio Classification Ability of ASR Foundation Models

    Authors: Rao Ma, Adian Liusie, Mark J. F. Gales, Kate M. Knill

    Abstract: Text and vision foundation models can perform many tasks in a zero-shot setting, a desirable property that enables these systems to be applied in general and low-resource settings. There has been far less work, however, on the zero-shot abilities of ASR foundation models, with these systems typically fine-tuned to specific tasks or constrained to applications that match their training criterion an… ▽ More

    Submitted 28 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: NAACL 2024 (main conference)

  8. arXiv:2311.05550  [pdf, other

    cs.CL cs.LG eess.AS

    Towards End-to-End Spoken Grammatical Error Correction

    Authors: Stefano Bannò, Rao Ma, Mengjie Qian, Kate M. Knill, Mark J. F. Gales

    Abstract: Grammatical feedback is crucial for L2 learners, teachers, and testers. Spoken grammatical error correction (GEC) aims to supply feedback to L2 learners on their use of grammar when speaking. This process usually relies on a cascaded pipeline comprising an ASR system, disfluency removal, and GEC, with the associated concern of propagating errors between these individual modules. In this paper, we… ▽ More

    Submitted 19 July, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

  9. arXiv:2309.07606  [pdf, other

    cs.CL cs.IR

    Zero-shot Audio Topic Reranking using Large Language Models

    Authors: Mengjie Qian, Rao Ma, Adian Liusie, Erfan Loweimi, Kate M. Knill, Mark J. F. Gales

    Abstract: Multimodal Video Search by Examples (MVSE) investigates using video clips as the query term for information retrieval, rather than the more traditional text query. This enables far richer search modalities such as images, speaker, content, topic, and emotion. A key element for this process is highly rapid and flexible search to support large archives, which in MVSE is facilitated by representing v… ▽ More

    Submitted 10 September, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

  10. arXiv:2307.09378  [pdf, other

    cs.CL cs.SD eess.AS

    Adapting an ASR Foundation Model for Spoken Language Assessment

    Authors: Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill

    Abstract: A crucial part of an accurate and reliable spoken language assessment system is the underlying ASR model. Recently, large-scale pre-trained ASR foundation models such as Whisper have been made available. As the output of these models is designed to be human readable, punctuation is added, numbers are presented in Arabic numeric form and abbreviations are included. Additionally, these models have a… ▽ More

    Submitted 10 October, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: Proceedings of SLaTE

  11. Adapting an Unadaptable ASR System

    Authors: Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill

    Abstract: As speech recognition model sizes and training data requirements grow, it is increasingly common for systems to only be available via APIs from online service providers rather than having direct access to models themselves. In this scenario it is challenging to adapt systems to a specific target domain. To address this problem we consider the recently released OpenAI Whisper ASR as an example of a… ▽ More

    Submitted 10 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Proceedings of INTERSPEECH

  12. N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space

    Authors: Rao Ma, Mark J. F. Gales, Kate M. Knill, Mengjie Qian

    Abstract: Error correction models form an important part of Automatic Speech Recognition (ASR) post-processing to improve the readability and quality of transcriptions. Most prior works use the 1-best ASR hypothesis as input and therefore can only perform correction by leveraging the context within one sentence. In this work, we propose a novel N-best T5 model for this task, which is fine-tuned from a T5 mo… ▽ More

    Submitted 10 October, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

    Comments: Proceedings of INTERSPEECH

  13. arXiv:2211.08849  [pdf, other

    eess.AS cs.CL

    L2 proficiency assessment using self-supervised speech representations

    Authors: Stefano Bannò, Kate M. Knill, Marco Matassoni, Vyas Raina, Mark J. F. Gales

    Abstract: There has been a growing demand for automated spoken language assessment systems in recent years. A standard pipeline for this process is to start with a speech recognition system and derive features, either hand-crafted or based on deep-learning, that exploit the transcription and audio. Though these approaches can yield high performance systems, they require speech recognition systems that can b… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.