Skip to main content

Showing 1–13 of 13 results for author: Mathews, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2507.06256  [pdf, ps, other

    cs.CR cs.AI cs.SD eess.AS

    Attacker's Noise Can Manipulate Your Audio-based LLM in the Real World

    Authors: Vinu Sankar Sadasivan, Soheil Feizi, Rajiv Mathews, Lun Wang

    Abstract: This paper investigates the real-world vulnerabilities of audio-based large language models (ALLMs), such as Qwen2-Audio. We first demonstrate that an adversary can craft stealthy audio perturbations to manipulate ALLMs into exhibiting specific targeted behaviors, such as eliciting responses to wake-keywords (e.g., "Hey Qwen"), or triggering harmful behaviors (e.g. "Change my calendar event"). Sub… ▽ More

    Submitted 7 July, 2025; originally announced July 2025.

  2. arXiv:2408.11873  [pdf, other

    eess.AS cs.CR cs.LG

    Parameter-Efficient Transfer Learning under Federated Learning for Automatic Speech Recognition

    Authors: Xuan Kan, Yonghui Xiao, Tien-Ju Yang, Nanxin Chen, Rajiv Mathews

    Abstract: This work explores the challenge of enhancing Automatic Speech Recognition (ASR) model performance across various user-specific domains while preserving user data privacy. We employ federated learning and parameter-efficient domain adaptation methods to solve the (1) massive data requirement of ASR models from user-specific scenarios and (2) the substantial communication cost between servers and c… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  3. arXiv:2310.11739  [pdf, other

    cs.LG cs.SD eess.AS

    Unintended Memorization in Large ASR Models, and How to Mitigate It

    Authors: Lun Wang, Om Thakkar, Rajiv Mathews

    Abstract: It is well-known that neural networks can unintentionally memorize their training examples, causing privacy concerns. However, auditing memorization in large non-auto-regressive automatic speech recognition (ASR) models has been challenging due to the high compute cost of existing methods such as hardness calibration. In this work, we design a simple auditing method to measure memorization in larg… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

  4. arXiv:2310.00141  [pdf, other

    cs.CL eess.AS

    The Gift of Feedback: Improving ASR Model Quality by Learning from User Corrections through Federated Learning

    Authors: Lillian Zhou, Yuxin Ding, Mingqing Chen, Harry Zhang, Rohit Prabhavalkar, Dhruv Guliani, Giovanni Motta, Rajiv Mathews

    Abstract: Automatic speech recognition (ASR) models are typically trained on large datasets of transcribed speech. As language evolves and new terms come into use, these models can become outdated and stale. In the context of models trained on the server but deployed on edge devices, errors may result from the mismatch between server training data and actual on-device usage. In this work, we seek to continu… ▽ More

    Submitted 30 November, 2023; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: Accepted to IEEE ASRU 2023

  5. arXiv:2208.03067  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning

    Authors: Sandy Ritchie, You-Chi Cheng, Mingqing Chen, Rajiv Mathews, Daan van Esch, Bo Li, Khe Chai Sim

    Abstract: Almost none of the 2,000+ languages spoken in Africa have widely available automatic speech recognition systems, and the required data is also only available for a few languages. We have experimented with two techniques which may provide pathways to large vocabulary speech recognition for African languages: multilingual modeling and self-supervised learning. We gathered available open source data… ▽ More

    Submitted 4 October, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  6. arXiv:2207.00706  [pdf, other

    eess.AS cs.CL cs.LG

    UserLibri: A Dataset for ASR Personalization Using Only Text

    Authors: Theresa Breiner, Swaroop Ramaswamy, Ehsan Variani, Shefali Garg, Rajiv Mathews, Khe Chai Sim, Kilol Gupta, Mingqing Chen, Lara McConnaughey

    Abstract: Personalization of speech models on mobile devices (on-device personalization) is an active area of research, but more often than not, mobile devices have more text-only data than paired audio-text data. We explore training a personalized language model on text-only data, used during inference to improve speech recognition performance for that user. We experiment on a user-clustered LibriSpeech co… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

    Comments: Accepted for publication in Interspeech 2022. 9 total pages with appendix, 9 total tables, 5 total figures

  7. arXiv:2204.09606  [pdf, other

    cs.CL cs.CR cs.LG cs.SD eess.AS

    Detecting Unintended Memorization in Language-Model-Fused ASR

    Authors: W. Ronny Huang, Steve Chien, Om Thakkar, Rajiv Mathews

    Abstract: End-to-end (E2E) models are often being accompanied by language models (LMs) via shallow fusion for boosting their overall quality as well as recognition of rare words. At the same time, several prior works show that LMs are susceptible to unintentionally memorizing rare or unique sequences in the training data. In this work, we design a framework for detecting memorization of random textual seque… ▽ More

    Submitted 28 June, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

    Comments: Interspeech 2022

  8. arXiv:2204.08345  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Extracting Targeted Training Data from ASR Models, and How to Mitigate It

    Authors: Ehsan Amid, Om Thakkar, Arun Narayanan, Rajiv Mathews, Françoise Beaufays

    Abstract: Recent work has designed methods to demonstrate that model updates in ASR training can leak potentially sensitive attributes of the utterances used in computing the updates. In this work, we design the first method to demonstrate information leakage about training data from trained ASR models. We design Noise Masking, a fill-in-the-blank style method for extracting targeted parts of training data… ▽ More

    Submitted 27 June, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

    Comments: Accepted to appear at Interspeech'22

  9. arXiv:2204.06322  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Production federated keyword spotting via distillation, filtering, and joint federated-centralized training

    Authors: Andrew Hard, Kurt Partridge, Neng Chen, Sean Augenstein, Aishanee Shah, Hyun Jin Park, Alex Park, Sara Ng, Jessica Nguyen, Ignacio Lopez Moreno, Rajiv Mathews, Françoise Beaufays

    Abstract: We trained a keyword spotting model using federated learning on real user devices and observed significant improvements when the model was deployed for inference on phones. To compensate for data domains that are missing from on-device training caches, we employed joint federated-centralized training. And to learn in the absence of curated labels on-device, we formulated a confidence filtering str… ▽ More

    Submitted 29 June, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted to Interspeech 2022

  10. arXiv:2109.01309  [pdf

    eess.IV cs.CV

    Unsupervised multi-latent space reinforcement learning framework for video summarization in ultrasound imaging

    Authors: Roshan P Mathews, Mahesh Raveendranatha Panicker, Abhilash R Hareendranathan, Yale Tung Chen, Jacob L Jaremko, Brian Buchanan, Kiran Vishnu Narayan, Kesavadas C, Greeta Mathews

    Abstract: The COVID-19 pandemic has highlighted the need for a tool to speed up triage in ultrasound scans and provide clinicians with fast access to relevant information. The proposed video-summarization technique is a step in this direction that provides clinicians access to relevant key-frames from a given ultrasound scan (such as lung ultrasound) while reducing resource, storage and bandwidth requiremen… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

    Comments: 24 pages, submitted to Elsevier Medical Image Analysis for review

  11. arXiv:2106.07006  [pdf

    eess.SP

    Towards Fast Region Adaptive Ultrasound Beamformer for Plane Wave Imaging Using Convolutional Neural Networks

    Authors: Roshan P Mathews, Mahesh Raveendranatha Panicker

    Abstract: Automatic learning algorithms for improving the image quality of diagnostic B-mode ultrasound (US) images have been gaining popularity in the recent past. In this work, a novel convolutional neural network (CNN) is trained using time of flight corrected in-vivo receiver data of plane wave transmit to produce corresponding high-quality minimum variance distortion less response (MVDR) beamformed ima… ▽ More

    Submitted 17 August, 2021; v1 submitted 13 June, 2021; originally announced June 2021.

    Comments: 4 pages, 4 figures, accepted in IEEE EMBC 2021

  12. arXiv:2009.14657  [pdf

    eess.IV

    CAD Applications and Emerging Research Potential in Medical Imaging

    Authors: Roshan P. Mathews, Greeta Mathews

    Abstract: Computer Aided Detection (CAD) is a valuable technique for precisely interpreting medical images and it has a global business opportunity of about USD 1.8 billion. The current aspects with reference to the four sub stages such as image pre-processing, segmentation, feature extraction and classification and the future scope of CAD in medical imaging has been discussed in this paper. Many reviewers… ▽ More

    Submitted 30 September, 2020; originally announced September 2020.

    Comments: 14 pages, 11 figures

  13. arXiv:2005.10406  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Training Keyword Spotting Models on Non-IID Data with Federated Learning

    Authors: Andrew Hard, Kurt Partridge, Cameron Nguyen, Niranjan Subrahmanya, Aishanee Shah, Pai Zhu, Ignacio Lopez Moreno, Rajiv Mathews

    Abstract: We demonstrate that a production-quality keyword-spotting model can be trained on-device using federated learning and achieve comparable false accept and false reject rates to a centrally-trained model. To overcome the algorithmic constraints associated with fitting on-device data (which are inherently non-independent and identically distributed), we conduct thorough empirical studies of optimizat… ▽ More

    Submitted 4 June, 2020; v1 submitted 20 May, 2020; originally announced May 2020.

    Comments: Submitted to Interspeech 2020