Skip to main content

Showing 1–50 of 90 results for author: Gales, M

.
  1. arXiv:2506.02758  [pdf, ps, other

    cs.CL cs.AI

    Exploiting the English Vocabulary Profile for L2 word-level vocabulary assessment with LLMs

    Authors: Stefano Bannò, Kate Knill, Mark Gales

    Abstract: Vocabulary use is a fundamental aspect of second language (L2) proficiency. To date, its assessment by automated systems has typically examined the context-independent, or part-of-speech (PoS) related use of words. This paper introduces a novel approach to enable fine-grained vocabulary evaluation exploiting the precise use of words within a sentence. The scheme combines large language models (LLM… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: Accepted to the 20th Workshop on Innovative Use of NLP for Building Educational Applications

  2. arXiv:2505.21148  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Assessment of L2 Oral Proficiency using Speech Large Language Models

    Authors: Rao Ma, Mengjie Qian, Siyuan Tang, Stefano Bannò, Kate M. Knill, Mark J. F. Gales

    Abstract: The growing population of L2 English speakers has increased the demand for developing automatic graders for spoken language assessment (SLA). Historically, statistical models, text encoders, and self-supervised speech models have been utilised for this task. However, cascaded systems suffer from the loss of information, while E2E graders also have limitations. With the recent advancements of multi… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: submitted to Interspeech

  3. arXiv:2505.21137  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Scaling and Prompting for Improved End-to-End Spoken Grammatical Error Correction

    Authors: Mengjie Qian, Rao Ma, Stefano Bannò, Kate M. Knill, Mark J. F. Gales

    Abstract: Spoken Grammatical Error Correction (SGEC) and Feedback (SGECF) are crucial for second language learners, teachers and test takers. Traditional SGEC systems rely on a cascaded pipeline consisting of an ASR, a module for disfluency detection (DD) and removal and one for GEC. With the rise of end-to-end (E2E) speech foundation models, we investigate their effectiveness in SGEC and feedback generatio… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: submitted to Interspeech

  4. arXiv:2505.20529  [pdf, ps, other

    cs.SD cs.LG eess.AS

    Training Articulatory Inversion Models for Interspeaker Consistency

    Authors: Charles McGhee, Mark J. F. Gales, Kate M. Knill

    Abstract: Acoustic-to-Articulatory Inversion (AAI) attempts to model the inverse mapping from speech to articulation. Exact articulatory prediction from speech alone may be impossible, as speakers can choose different forms of articulation seemingly without reference to their vocal tract structure. However, once a speaker has selected an articulatory form, their productions vary minimally. Recent works in A… ▽ More

    Submitted 9 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  5. arXiv:2505.15240  [pdf, ps, other

    cs.AI cs.LG stat.ML

    Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-judge

    Authors: Yassir Fathullah, Mark J. F. Gales

    Abstract: This paper explores generalised probabilistic modelling and uncertainty estimation in comparative LLM-as-a-judge frameworks. We show that existing Product-of-Experts methods are specific cases of a broader framework, enabling diverse modelling options. Furthermore, we propose improved uncertainty estimates for individual comparisons, enabling more efficient selection and achieving strong performan… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: To appear in UAI 2025

  6. arXiv:2505.14286  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs

    Authors: Rao Ma, Mengjie Qian, Vyas Raina, Mark Gales, Kate Knill

    Abstract: The combination of pre-trained speech encoders with large language models has enabled the development of speech LLMs that can handle a wide range of spoken language processing tasks. While these models are powerful and flexible, this very flexibility may make them more vulnerable to adversarial attacks. To examine the extent of this problem, in this work we investigate universal acoustic adversari… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  7. arXiv:2505.02884  [pdf, other

    cs.LG cs.AI

    Unlearning vs. Obfuscation: Are We Truly Removing Knowledge?

    Authors: Guangzhi Sun, Potsawee Manakul, Xiao Zhan, Mark Gales

    Abstract: Unlearning has emerged as a critical capability for large language models (LLMs) to support data privacy, regulatory compliance, and ethical AI deployment. Recent techniques often rely on obfuscation by injecting incorrect or irrelevant information to suppress knowledge. Such methods effectively constitute knowledge addition rather than true removal, often leaving models vulnerable to probing. In… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  8. arXiv:2504.18950  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness

    Authors: Erfan Loweimi, Mengjie Qian, Kate Knill, Mark Gales

    Abstract: There is a growing abundance of publicly available or company-owned audio/video archives, highlighting the increasing importance of efficient access to desired content and information retrieval from these archives. This paper investigates the challenges, solutions, effectiveness, and robustness of speaker retrieval systems developed "in the wild" which involves addressing two primary challenges: e… ▽ More

    Submitted 29 April, 2025; v1 submitted 26 April, 2025; originally announced April 2025.

    Comments: 13 pages, 10 figures, 10 tables, 76 references

  9. arXiv:2412.11986  [pdf, other

    cs.CL

    Speak & Improve Corpus 2025: an L2 English Speech Corpus for Language Assessment and Feedback

    Authors: Kate Knill, Diane Nicholls, Mark J. F. Gales, Mengjie Qian, Pawel Stroinski

    Abstract: We introduce the Speak & Improve Corpus 2025, a dataset of L2 learner English data with holistic scores and language error annotation, collected from open (spontaneous) speaking tests on the Speak & Improve learning platform. The aim of the corpus release is to address a major challenge to developing L2 spoken language processing systems, the lack of publicly available data with high-quality annot… ▽ More

    Submitted 17 December, 2024; v1 submitted 16 December, 2024; originally announced December 2024.

  10. arXiv:2412.11985  [pdf, other

    cs.CL

    Speak & Improve Challenge 2025: Tasks and Baseline Systems

    Authors: Mengjie Qian, Kate Knill, Stefano Banno, Siyuan Tang, Penny Karanasou, Mark J. F. Gales, Diane Nicholls

    Abstract: This paper presents the "Speak & Improve Challenge 2025: Spoken Language Assessment and Feedback" -- a challenge associated with the ISCA SLaTE 2025 Workshop. The goal of the challenge is to advance research on spoken language assessment and feedback, with tasks associated with both the underlying technology and language learning feedback. Linked with the challenge, the Speak & Improve (S&I) Corpu… ▽ More

    Submitted 17 December, 2024; v1 submitted 16 December, 2024; originally announced December 2024.

  11. arXiv:2410.10215  [pdf, other

    cs.CL cs.LG

    SkillAggregation: Reference-free LLM-Dependent Aggregation

    Authors: Guangzhi Sun, Anmol Kagrecha, Potsawee Manakul, Phil Woodland, Mark Gales

    Abstract: Large Language Models (LLMs) are increasingly used to assess NLP tasks due to their ability to generate human-like judgments. Single LLMs were used initially, however, recent work suggests using multiple LLMs as judges yields improved performance. An important step in exploiting multiple judgements is the combination stage, aggregation. Existing methods in NLP either assign equal weight to all LLM… ▽ More

    Submitted 14 October, 2024; originally announced October 2024.

  12. arXiv:2409.15979  [pdf, other

    cs.CL

    Finetuning LLMs for Comparative Assessment Tasks

    Authors: Vatsal Raina, Adian Liusie, Mark Gales

    Abstract: Automated assessment in natural language generation is a challenging task. Instruction-tuned large language models (LLMs) have shown promise in reference-free evaluation, particularly through comparative assessment. However, the quadratic computational complexity of pairwise comparisons limits its scalability. To address this, efficient comparative assessment has been explored by applying comparat… ▽ More

    Submitted 24 September, 2024; originally announced September 2024.

    Comments: 8 pages, 5 figures, 6 tables

  13. arXiv:2409.09554  [pdf, other

    cs.CL cs.SD eess.AS

    ASR Error Correction using Large Language Models

    Authors: Rao Ma, Mengjie Qian, Mark Gales, Kate Knill

    Abstract: Error correction (EC) models play a crucial role in refining Automatic Speech Recognition (ASR) transcriptions, enhancing the readability and quality of transcriptions. Without requiring access to the underlying code or model weights, EC can improve performance and provide domain adaptation for black-box ASR systems. This work investigates the use of large language models (LLMs) for error correcti… ▽ More

    Submitted 18 January, 2025; v1 submitted 14 September, 2024; originally announced September 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  14. arXiv:2408.09565  [pdf, other

    cs.CL cs.AI

    Grammatical Error Feedback: An Implicit Evaluation Approach

    Authors: Stefano Bannò, Kate Knill, Mark J. F. Gales

    Abstract: Grammatical feedback is crucial for consolidating second language (L2) learning. Most research in computer-assisted language learning has focused on feedback through grammatical error correction (GEC) systems, rather than examining more holistic feedback that may be more useful for learners. This holistic feedback will be referred to as grammatical error feedback (GEF). In this paper, we present a… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  15. arXiv:2407.06800  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Learn and Don't Forget: Adding a New Language to ASR Foundation Models

    Authors: Mengjie Qian, Siyuan Tang, Rao Ma, Kate M. Knill, Mark J. F. Gales

    Abstract: Foundation ASR models often support many languages, e.g. 100 languages in Whisper. However, there has been limited work on integrating an additional, typically low-resource, language, while maintaining performance on the original language set. Fine-tuning, while simple, may degrade the accuracy of the original set. We compare three approaches that exploit adaptation parameters: soft language code… ▽ More

    Submitted 24 September, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: Proceedings of Interspeech

  16. arXiv:2407.04482  [pdf, other

    cs.SD cs.CL eess.AS

    Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models

    Authors: Vyas Raina, Mark Gales

    Abstract: Speech enabled foundation models, either in the form of flexible speech recognition based systems or audio-prompted large language models (LLMs), are becoming increasingly popular. One of the interesting aspects of these models is their ability to perform tasks other than automatic speech recognition (ASR) using an appropriate prompt. For example, the OpenAI Whisper model can perform both speech t… ▽ More

    Submitted 11 October, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  17. arXiv:2407.01130  [pdf, other

    cs.CL

    Cross-Lingual Transfer Learning for Speech Translation

    Authors: Rao Ma, Mengjie Qian, Yassir Fathullah, Siyuan Tang, Mark Gales, Kate Knill

    Abstract: There has been increasing interest in building multilingual foundation models for NLP and speech research. This paper examines how to expand the speech translation capability of these models with restricted data. Whisper, a speech foundation model with strong performance on speech recognition and English translation, is used as the example model. Using speech-to-speech retrieval to analyse the aud… ▽ More

    Submitted 11 February, 2025; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by NAACL 2025

  18. arXiv:2405.13684  [pdf, other

    cs.CL

    CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models

    Authors: Guangzhi Sun, Potsawee Manakul, Adian Liusie, Kunat Pipatanakul, Chao Zhang, Phil Woodland, Mark Gales

    Abstract: Multimodal foundation models are prone to hallucination, generating outputs that either contradict the input or are not grounded by factual information. Given the diversity in architectures, training data and instruction tuning techniques, there can be large variations in systems' susceptibility to hallucinations. To assess system hallucination robustness, hallucination ranking approaches have bee… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 21 pages. Preprint

  19. arXiv:2405.12363  [pdf, other

    cs.CL

    Question-Based Retrieval using Atomic Units for Enterprise RAG

    Authors: Vatsal Raina, Mark Gales

    Abstract: Enterprise retrieval augmented generation (RAG) offers a highly flexible framework for combining powerful large language models (LLMs) with internal, possibly temporally changing, documents. In RAG, documents are first chunked. Relevant chunks are then retrieved for a user query, which are passed as context to a synthesizer LLM to generate the query response. However, the retrieval step can limit… ▽ More

    Submitted 30 August, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures, 5 tables

  20. arXiv:2405.06134  [pdf, other

    cs.CL cs.SD eess.AS

    Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models

    Authors: Vyas Raina, Rao Ma, Charles McGhee, Kate Knill, Mark Gales

    Abstract: Recent developments in large speech foundation models like Whisper have led to their widespread use in many automatic speech recognition (ASR) applications. These systems incorporate `special tokens' in their vocabulary, such as $\texttt{<|endoftext|>}$, to guide their language generation process. However, we demonstrate that these tokens can be exploited by adversarial attacks to manipulate the m… ▽ More

    Submitted 17 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  21. arXiv:2405.05894  [pdf, other

    cs.CL

    Efficient LLM Comparative Assessment: a Product of Experts Framework for Pairwise Comparisons

    Authors: Adian Liusie, Vatsal Raina, Yassir Fathullah, Mark Gales

    Abstract: LLM-as-a-judge approaches are a practical and effective way of assessing a range of text tasks. However, when using pairwise comparisons to rank a set of candidates, the computational cost scales quadratically with the number of candidates, which has practical limitations. This paper introduces a Product of Expert (PoE) framework for efficient LLM Comparative Assessment. Here individual comparison… ▽ More

    Submitted 12 November, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  22. arXiv:2405.01601  [pdf, other

    cs.CL cs.LG

    Efficient Sample-Specific Encoder Perturbations

    Authors: Yassir Fathullah, Mark J. F. Gales

    Abstract: Encoder-decoder foundation models have displayed state-of-the-art performance on a range of autoregressive sequence tasks. This paper proposes a simple and lightweight modification to such systems to control the behaviour according to a specific attribute of interest. This paper proposes a novel inference-efficient approach to modifying the behaviour of an encoder-decoder system according to a spe… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: To appear in NAACL 2024

  23. arXiv:2404.18557  [pdf, other

    cs.CL

    Can GPT-4 do L2 analytic assessment?

    Authors: Stefano Bannò, Hari Krishna Vydana, Kate M. Knill, Mark J. F. Gales

    Abstract: Automated essay scoring (AES) to evaluate second language (L2) proficiency has been a firmly established technology used in educational contexts for decades. Although holistic scoring has seen advancements in AES that match or even exceed human performance, analytic scoring still encounters issues as it inherits flaws and shortcomings from the human scoring process. The recent introduction of larg… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: Accepted for the 19th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2024)

  24. arXiv:2404.10704  [pdf, other

    cs.CL cs.AI

    Question Difficulty Ranking for Multiple-Choice Reading Comprehension

    Authors: Vatsal Raina, Mark Gales

    Abstract: Multiple-choice (MC) tests are an efficient method to assess English learners. It is useful for test creators to rank candidate MC questions by difficulty during exam curation. Typically, the difficulty is determined by having human test takers trial the questions in a pretesting stage. However, this is expensive and not scalable. Therefore, we explore automated approaches to rank MC questions by… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 7 pages, 3 figures

  25. arXiv:2403.19548  [pdf, other

    cs.CL

    WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models

    Authors: Piotr Molenda, Adian Liusie, Mark J. F. Gales

    Abstract: Watermarking generative-AI systems, such as LLMs, has gained considerable interest, driven by their enhanced capabilities across a wide range of tasks. Although current approaches have demonstrated that small, context-dependent shifts in the word distributions can be used to apply and detect watermarks, there has been little work in analyzing the impact that these perturbations have on the quality… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: NAACL 2024 (Findings)

  26. arXiv:2403.13590  [pdf, other

    cs.CL

    Teacher-Student Training for Debiasing: General Permutation Debiasing for Large Language Models

    Authors: Adian Liusie, Yassir Fathullah, Mark J. F. Gales

    Abstract: Large Language Models (LLMs) have demonstrated impressive zero-shot capabilities and versatility in NLP tasks, however they sometimes fail to maintain crucial invariances for specific tasks. One example is permutation sensitivity, where LLMs' outputs may significantly vary depending on the order of the input options. While debiasing techniques can mitigate these issues, and yield better performanc… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  27. arXiv:2402.18216  [pdf, other

    cs.CL

    LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History

    Authors: Akash Gupta, Ivaxi Sheth, Vyas Raina, Mark Gales, Mario Fritz

    Abstract: With the recent emergence of powerful instruction-tuned large language models (LLMs), various helpful conversational Artificial Intelligence (AI) systems have been deployed across many applications. When prompted by users, these AI systems successfully perform a wide range of tasks as part of a conversation. To provide some sort of memory and context, such approaches typically condition their outp… ▽ More

    Submitted 11 October, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 20 pages, 13 figures, 20 tables, EMNLP Main Conference 2024

  28. arXiv:2402.14016  [pdf, other

    cs.CL

    Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment

    Authors: Vyas Raina, Adian Liusie, Mark Gales

    Abstract: Large Language Models (LLMs) are powerful zero-shot assessors used in real-world situations such as assessing written exams and benchmarking systems. Despite these critical applications, no existing work has analyzed the vulnerability of judge-LLMs to adversarial manipulation. This work presents the first study on the adversarial robustness of assessment LLMs, where we demonstrate that short unive… ▽ More

    Submitted 4 July, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  29. arXiv:2402.00978  [pdf, other

    cs.CL cs.AI cs.IT

    An Information-Theoretic Approach to Analyze NLP Classification Tasks

    Authors: Luran Wang, Mark Gales, Vatsal Raina

    Abstract: Understanding the importance of the inputs on the output is useful across many tasks. This work provides an information-theoretic framework to analyse the influence of inputs for text classification tasks. Natural language processing (NLP) tasks take either a single element input or multiple element inputs to predict an output variable, where an element is a block of text. Each text element has tw… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: 21 pages, 10 figures, 11 tables

  30. arXiv:2311.09363  [pdf, other

    cs.CL

    Investigating the Emergent Audio Classification Ability of ASR Foundation Models

    Authors: Rao Ma, Adian Liusie, Mark J. F. Gales, Kate M. Knill

    Abstract: Text and vision foundation models can perform many tasks in a zero-shot setting, a desirable property that enables these systems to be applied in general and low-resource settings. There has been far less work, however, on the zero-shot abilities of ASR foundation models, with these systems typically fine-tuned to specific tasks or constrained to applications that match their training criterion an… ▽ More

    Submitted 28 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: NAACL 2024 (main conference)

  31. Structural-Based Uncertainty in Deep Learning Across Anatomical Scales: Analysis in White Matter Lesion Segmentation

    Authors: Nataliia Molchanova, Vatsal Raina, Andrey Malinin, Francesco La Rosa, Adrien Depeursinge, Mark Gales, Cristina Granziera, Henning Muller, Mara Graziani, Meritxell Bach Cuadra

    Abstract: This paper explores uncertainty quantification (UQ) as an indicator of the trustworthiness of automated deep-learning (DL) tools in the context of white matter lesion (WML) segmentation from magnetic resonance imaging (MRI) scans of multiple sclerosis (MS) patients. Our study focuses on two principal aspects of uncertainty in structured output segmentation tasks. First, we postulate that a reliabl… ▽ More

    Submitted 18 November, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Journal ref: Computers in Biology and Medicine 184(2025)109336

  32. arXiv:2311.05550  [pdf, other

    cs.CL cs.LG eess.AS

    Towards End-to-End Spoken Grammatical Error Correction

    Authors: Stefano Bannò, Rao Ma, Mengjie Qian, Kate M. Knill, Mark J. F. Gales

    Abstract: Grammatical feedback is crucial for L2 learners, teachers, and testers. Spoken grammatical error correction (GEC) aims to supply feedback to L2 learners on their use of grammar when speaking. This process usually relies on a cascaded pipeline comprising an ASR system, disfluency removal, and GEC, with the associated concern of propagating errors between these individual modules. In this paper, we… ▽ More

    Submitted 19 July, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

  33. arXiv:2311.04554  [pdf, other

    cs.CL

    Assessing Distractors in Multiple-Choice Tests

    Authors: Vatsal Raina, Adian Liusie, Mark Gales

    Abstract: Multiple-choice tests are a common approach for assessing candidates' comprehension skills. Standard multiple-choice reading comprehension exams require candidates to select the correct answer option from a discrete set based on a question in relation to a contextual passage. For appropriate assessment, the distractor answer options must by definition be incorrect but plausible and diverse. Howeve… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Accepted at the 4th Workshop on Evaluation and Comparison of NLP Systems @ AACL 2023

  34. arXiv:2309.12551  [pdf, other

    cs.CL

    Is it Possible to Modify Text to a Target Readability Level? An Initial Investigation Using Zero-Shot Large Language Models

    Authors: Asma Farajidizaji, Vatsal Raina, Mark Gales

    Abstract: Text simplification is a common task where the text is adapted to make it easier to understand. Similarly, text elaboration can make a passage more sophisticated, offering a method to control the complexity of reading comprehension tests. However, text simplification and elaboration tasks are limited to only relatively alter the readability of texts. It is useful to directly modify the readability… ▽ More

    Submitted 27 May, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: 11 pages, 4 figures, 5 tables

  35. arXiv:2309.07606  [pdf, other

    cs.CL cs.IR

    Zero-shot Audio Topic Reranking using Large Language Models

    Authors: Mengjie Qian, Rao Ma, Adian Liusie, Erfan Loweimi, Kate M. Knill, Mark J. F. Gales

    Abstract: Multimodal Video Search by Examples (MVSE) investigates using video clips as the query term for information retrieval, rather than the more traditional text query. This enables far richer search modalities such as images, speaker, content, topic, and emotion. A key element for this process is highly rapid and flexible search to support large archives, which in MVSE is facilitated by representing v… ▽ More

    Submitted 10 September, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

  36. arXiv:2309.06520  [pdf, other

    cs.CL cs.AI

    Minimum Bayes' Risk Decoding for System Combination of Grammatical Error Correction Systems

    Authors: Vyas Raina, Mark Gales

    Abstract: For sequence-to-sequence tasks it is challenging to combine individual system outputs. Further, there is also often a mismatch between the decoding criterion and the one used for assessment. Minimum Bayes' Risk (MBR) decoding can be used to combine system outputs in a manner that encourages better alignment with the final assessment criterion. This paper examines MBR decoding for Grammatical Error… ▽ More

    Submitted 27 October, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

  37. arXiv:2309.04992  [pdf, other

    cs.CL

    Mitigating Word Bias in Zero-shot Prompt-based Classifiers

    Authors: Adian Liusie, Potsawee Manakul, Mark J. F. Gales

    Abstract: Prompt-based classifiers are an attractive approach for zero-shot classification. However, the precise choice of the prompt template and label words can largely influence performance, with semantically equivalent settings often showing notable performance difference. This discrepancy can be partly attributed to word biases, where the classifier may be biased towards classes. To address this proble… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  38. arXiv:2307.09378  [pdf, other

    cs.CL cs.SD eess.AS

    Adapting an ASR Foundation Model for Spoken Language Assessment

    Authors: Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill

    Abstract: A crucial part of an accurate and reliable spoken language assessment system is the underlying ASR model. Recently, large-scale pre-trained ASR foundation models such as Whisper have been made available. As the output of these models is designed to be human readable, punctuation is added, numbers are presented in Arabic numeric form and abbreviations are included. Additionally, these models have a… ▽ More

    Submitted 10 October, 2023; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: Proceedings of SLaTE

  39. arXiv:2307.07889  [pdf, other

    cs.CL

    LLM Comparative Assessment: Zero-shot NLG Evaluation through Pairwise Comparisons using Large Language Models

    Authors: Adian Liusie, Potsawee Manakul, Mark J. F. Gales

    Abstract: Current developments in large language models (LLMs) have enabled impressive zero-shot capabilities across various natural language tasks. An interesting application of these systems is in the automated assessment of natural language generation (NLG), a highly challenging area with great practical benefit. In this paper, we explore two options for exploiting the emergent abilities of LLMs for zero… ▽ More

    Submitted 6 February, 2024; v1 submitted 15 July, 2023; originally announced July 2023.

    Comments: To Appear at EACL 2024

  40. arXiv:2307.04172  [pdf, other

    cs.CL cs.SD eess.AS

    Can Generative Large Language Models Perform ASR Error Correction?

    Authors: Rao Ma, Mengjie Qian, Potsawee Manakul, Mark Gales, Kate Knill

    Abstract: ASR error correction is an interesting option for post processing speech recognition system outputs. These error correction models are usually trained in a supervised fashion using the decoding results of a target ASR system. This approach can be computationally intensive and the model is tuned to a specific ASR system. Recently generative large language models (LLMs) have been applied to a wide r… ▽ More

    Submitted 29 September, 2023; v1 submitted 9 July, 2023; originally announced July 2023.

  41. arXiv:2307.01076  [pdf, other

    cs.CL

    Analyzing Multiple-Choice Reading and Listening Comprehension Tests

    Authors: Vatsal Raina, Adian Liusie, Mark Gales

    Abstract: Multiple-choice reading and listening comprehension tests are an important part of language assessment. Content creators for standard educational tests need to carefully curate questions that assess the comprehension abilities of candidates taking the tests. However, recent work has shown that a large number of questions in general multiple-choice reading comprehension datasets can be answered wit… ▽ More

    Submitted 3 July, 2023; originally announced July 2023.

    Comments: 5 pages, 3 figures, accepted at SLaTE-2023

  42. arXiv:2306.13047  [pdf, other

    cs.CL

    Analysis of the Cambridge Multiple-Choice Questions Reading Dataset with a Focus on Candidate Response Distribution

    Authors: Adian Liusie, Vatsal Raina, Andrew Mullooly, Kate Knill, Mark J. F. Gales

    Abstract: Multiple choice exams are widely used to assess candidates across a diverse range of domains and tasks. To moderate question quality, newly proposed questions often pass through pre-test evaluation stages before being deployed into real-world exams. Currently, this evaluation process is manually intensive, which can lead to time lags in the question development cycle. Streamlining this process via… ▽ More

    Submitted 15 October, 2023; v1 submitted 22 June, 2023; originally announced June 2023.

  43. arXiv:2306.12043  [pdf, other

    cs.CL cs.AI

    Sample Attackability in Natural Language Adversarial Attacks

    Authors: Vyas Raina, Mark Gales

    Abstract: Adversarial attack research in natural language processing (NLP) has made significant progress in designing powerful attack methods and defence approaches. However, few efforts have sought to identify which source samples are the most attackable or robust, i.e. can we determine for an unseen target model, which samples are the most vulnerable to an adversarial attack. This work formally extends th… ▽ More

    Submitted 21 June, 2023; originally announced June 2023.

    Comments: arXiv admin note: text overlap with arXiv:2301.12896

  44. arXiv:2306.05317  [pdf, other

    cs.CL

    CUED at ProbSum 2023: Hierarchical Ensemble of Summarization Models

    Authors: Potsawee Manakul, Yassir Fathullah, Adian Liusie, Vyas Raina, Vatsal Raina, Mark Gales

    Abstract: In this paper, we consider the challenge of summarizing patients' medical progress notes in a limited data setting. For the Problem List Summarization (shared task 1A) at the BioNLP Workshop 2023, we demonstrate that Clinical-T5 fine-tuned to 765 medical clinic notes outperforms other extractive, abstractive and zero-shot baselines, yielding reasonable baseline systems for medical note summarizati… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: BioNLP Workshop @ ACL 2023

  45. Adapting an Unadaptable ASR System

    Authors: Rao Ma, Mengjie Qian, Mark J. F. Gales, Kate M. Knill

    Abstract: As speech recognition model sizes and training data requirements grow, it is increasingly common for systems to only be available via APIs from online service providers rather than having direct access to models themselves. In this scenario it is challenging to adapt systems to a specific target domain. To address this problem we consider the recently released OpenAI Whisper ASR as an example of a… ▽ More

    Submitted 10 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Proceedings of INTERSPEECH

  46. arXiv:2305.12498  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    Multi-Head State Space Model for Speech Recognition

    Authors: Yassir Fathullah, Chunyang Wu, Yuan Shangguan, Junteng Jia, Wenhan Xiong, Jay Mahadeokar, Chunxi Liu, Yangyang Shi, Ozlem Kalinli, Mike Seltzer, Mark J. F. Gales

    Abstract: State space models (SSMs) have recently shown promising results on small-scale sequence and language modelling tasks, rivalling and outperforming many attention-based approaches. In this paper, we propose a multi-head state space (MH-SSM) architecture equipped with special gating mechanisms, where parallel heads are taught to learn local and global temporal dynamics on sequence data. As a drop-in… ▽ More

    Submitted 25 May, 2023; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Interspeech 2023

  47. arXiv:2305.10384  [pdf, other

    cs.LG cs.CL

    Logit-Based Ensemble Distribution Distillation for Robust Autoregressive Sequence Uncertainties

    Authors: Yassir Fathullah, Guoxuan Xia, Mark Gales

    Abstract: Efficiently and reliably estimating uncertainty is an important objective in deep learning. It is especially pertinent to autoregressive sequence tasks, where training and inference costs are typically very high. However, existing research has predominantly focused on tasks with static data such as image classification. In this work, we investigate Ensemble Distribution Distillation (EDD) applied… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

    Comments: Accepted to UAI 2023, preliminary version

  48. arXiv:2305.05098  [pdf, other

    cs.LG cs.AI cs.CL

    Who Needs Decoders? Efficient Estimation of Sequence-level Attributes

    Authors: Yassir Fathullah, Puria Radmard, Adian Liusie, Mark J. F. Gales

    Abstract: State-of-the-art sequence-to-sequence models often require autoregressive decoding, which can be highly expensive. However, for some downstream tasks such as out-of-distribution (OOD) detection and resource allocation, the actual decoding output is not needed just a scalar attribute of this sequence. In these scenarios, where for example knowing the quality of a system's output to predict poor per… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

  49. arXiv:2305.01437  [pdf, other

    cs.CL cs.AI

    Sentiment Perception Adversarial Attacks on Neural Machine Translation Systems

    Authors: Vyas Raina, Mark Gales

    Abstract: With the advent of deep learning methods, Neural Machine Translation (NMT) systems have become increasingly powerful. However, deep learning based systems are susceptible to adversarial attacks, where imperceptible changes to the input can cause undesirable changes at the output of the system. To date there has been little work investigating adversarial attacks on sequence-to-sequence systems, suc… ▽ More

    Submitted 24 June, 2023; v1 submitted 2 May, 2023; originally announced May 2023.

  50. arXiv:2304.00714  [pdf, other

    eess.AS

    Ensemble prosody prediction for expressive speech synthesis

    Authors: Tian Huey Teh, Vivian Hu, Devang S Ram Mohan, Zack Hodari, Christopher G. R. Wallis, Tomás Gomez Ibarrondo, Alexandra Torresquintero, James Leoni, Mark Gales, Simon King

    Abstract: Generating expressive speech with rich and varied prosody continues to be a challenge for Text-to-Speech. Most efforts have focused on sophisticated neural architectures intended to better model the data distribution. Yet, in evaluations it is generally found that no single model is preferred for all input texts. This suggests an approach that has rarely been used before for Text-to-Speech: an ens… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

    Comments: ICASSP 2023