Skip to main content

Showing 1–18 of 18 results for author: Garner, P N

.
  1. arXiv:2410.06846  [pdf, other

    cs.CL cs.AI cs.LG cs.SD eess.AS

    Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity

    Authors: Mutian He, Philip N. Garner

    Abstract: Architectures such as Linformer and Mamba have recently emerged as competitive linear time replacements for transformers. However, corresponding large pretrained models are often unavailable, especially in non-text domains. To remedy this, we present a Cross-Architecture Layerwise Distillation (CALD) approach that jointly converts a transformer model to a linear time substitute and fine-tunes it t… ▽ More

    Submitted 13 March, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

    Comments: 18 pages, 5 figures; ICLR 2025 camera ready. Code: https://github.com/idiap/linearize-distill-pretrained-transformers

  2. arXiv:2409.10673  [pdf, other

    cs.LG cs.CL stat.ML

    A Bayesian Interpretation of Adaptive Low-Rank Adaptation

    Authors: Haolin Chen, Philip N. Garner

    Abstract: Motivated by the sensitivity-based importance score of the adaptive low-rank adaptation (AdaLoRA), we utilize more theoretically supported metrics, including the signal-to-noise ratio (SNR), along with the Improved Variational Online Newton (IVON) optimizer, for adaptive parameter budget allocation. The resulting Bayesian counterpart not only has matched or surpassed the performance of using the s… ▽ More

    Submitted 11 January, 2025; v1 submitted 16 September, 2024; originally announced September 2024.

    Comments: ICASSP 2025

  3. arXiv:2409.05589  [pdf, other

    eess.AS

    An investigation of modularity for noise robustness in conformer-based ASR

    Authors: Louise Coppieters de Gibson, Philip N. Garner, Pierre-Edouard Honnet

    Abstract: Whilst state of the art automatic speech recognition (ASR) can perform well, it still degrades when exposed to acoustic environments that differ from those used when training the model. Unfamiliar environments for a given model may well be known a-priori, but yield comparatively small amounts of adaptation data. In this experimental study, we investigate to what extent recent formalisations of mod… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: 5 pages, 3 figures

  4. Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks

    Authors: Alexandre Bittar, Philip N. Garner

    Abstract: Understanding cognitive processes in the brain demands sophisticated models capable of replicating neural dynamics at large scales. We present a physiologically inspired speech recognition architecture, compatible and scalable with deep learning frameworks, and demonstrate that end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.… ▽ More

    Submitted 2 September, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Journal ref: Frontiers in Neuroscience, Vol. 18 (2024)

  5. arXiv:2402.12220  [pdf, ps, other

    eess.AS cs.LG

    Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting

    Authors: Haolin Chen, Philip N. Garner

    Abstract: We are motivated primarily by the adaptation of text-to-speech synthesis models; however we argue that more generic parameter-efficient fine-tuning (PEFT) is an appropriate framework to do such adaptation. Nevertheless, catastrophic forgetting remains an issue with PEFT, damaging the pre-trained model's inherent capabilities. We demonstrate that existing Bayesian learning techniques can be applied… ▽ More

    Submitted 6 December, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  6. arXiv:2311.17655  [pdf, other

    cs.CV cs.AI cs.MM cs.SD eess.AS

    Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes

    Authors: Pavel Korshunov, Haolin Chen, Philip N. Garner, Sebastien Marcel

    Abstract: The task of deepfakes detection is far from being solved by speech or vision researchers. Several publicly available databases of fake synthetic video and speech were built to aid the development of detection methods. However, existing databases typically focus on visual or voice modalities and provide no proof that their deepfakes can in fact impersonate any real person. In this paper, we present… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 10 pages, 3 figures, 3 tables

    ACM Class: I.4.3; I.2.10; H.5.1

  7. arXiv:2305.13512  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding

    Authors: Mutian He, Philip N. Garner

    Abstract: Recently, large pretrained language models have demonstrated strong language understanding capabilities. This is particularly reflected in their zero-shot and in-context learning abilities on downstream tasks through prompting. To assess their impact on spoken language understanding (SLU), we evaluate several such models like ChatGPT and OPT of different sizes on multiple benchmarks. We verify the… ▽ More

    Submitted 17 August, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: 6 pages, 2 figures; Accepted by Interspeech 2023

  8. arXiv:2305.09652  [pdf, other

    cs.CL cs.SD eess.AS

    The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation

    Authors: Mutian He, Philip N. Garner

    Abstract: End-to-end spoken language understanding (SLU) remains elusive even with current large pretrained language models on text and speech, especially in multilingual cases. Machine translation has been established as a powerful pretraining objective on text as it enables the model to capture high-level semantics of the input utterance and associations between different languages, which is desired for s… ▽ More

    Submitted 17 October, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

    Comments: 16 pages, 3 figures; accepted by Findings of EMNLP 2023

  9. arXiv:2303.01849  [pdf, other

    eess.AS cs.SD

    An investigation into the adaptability of a diffusion-based TTS model

    Authors: Haolin Chen, Philip N. Garner

    Abstract: Given the recent success of diffusion in producing natural-sounding synthetic speech, we investigate how diffusion can be used in speaker adaptive TTS. Taking cues from more traditional adaptation approaches, we show that adaptation can be included in a diffusion pipeline using conditional layer normalization with a step embedding. However, we show experimentally that, whilst the approach has meri… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

  10. arXiv:2212.01187  [pdf, other

    cs.CL cs.LG cs.NE cs.SD eess.AS

    Surrogate Gradient Spiking Neural Networks as Encoders for Large Vocabulary Continuous Speech Recognition

    Authors: Alexandre Bittar, Philip N. Garner

    Abstract: Compared to conventional artificial neurons that produce dense and real-valued responses, biologically-inspired spiking neurons transmit sparse and binary information, which can also lead to energy-efficient implementations. Recent research has shown that spiking neural networks can be trained like standard recurrent neural networks using the surrogate gradient method. They have shown promising re… ▽ More

    Submitted 16 February, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

  11. arXiv:2208.11700  [pdf, ps, other

    q-bio.NC cs.AI cs.SD eess.AS

    Low-Level Physiological Implications of End-to-End Learning of Speech Recognition

    Authors: Louise Coppieters de Gibson, Philip N. Garner

    Abstract: Current speech recognition architectures perform very well from the point of view of machine learning, hence user interaction. This suggests that they are emulating the human biological system well. We investigate whether the inference can be inverted to provide insights into that biological system; in particular the hearing mechanism. Using SincNet, we confirm that end-to-end systems do learn wel… ▽ More

    Submitted 22 August, 2022; originally announced August 2022.

    Comments: Submitted to INTERSPEECH 2022

  12. Bayesian Recurrent Units and the Forward-Backward Algorithm

    Authors: Alexandre Bittar, Philip N. Garner

    Abstract: Using Bayes's theorem, we derive a unit-wise recurrence as well as a backward recursion similar to the forward-backward algorithm. The resulting Bayesian recurrent units can be integrated as recurrent neural networks within deep learning frameworks, while retaining a probabilistic interpretation from the direct correspondence with hidden Markov models. Whilst the contribution is mainly theoretical… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

    Comments: Submitted to INTERSPEECH 2022

  13. arXiv:2006.05389  [pdf, other

    eess.SP cs.CV cs.LG stat.ML

    A t-distribution based operator for enhancing out of distribution robustness of neural network classifiers

    Authors: Niccolò Antonello, Philip N. Garner

    Abstract: Neural Network (NN) classifiers can assign extreme probabilities to samples that have not appeared during training (out-of-distribution samples) resulting in erroneous and unreliable predictions. One of the causes for this unwanted behaviour lies in the use of the standard softmax operator which pushes the posterior probabilities to be either zero or unity hence failing to model uncertainty. The s… ▽ More

    Submitted 9 October, 2020; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: 5 pages, 5 figures, to be published in IEEE Signal Processing Letters, reproducible code https://github.com/idiap/tsoftmax

  14. A Bayesian Approach to Recurrence in Neural Networks

    Authors: Philip N. Garner, Sibo Tong

    Abstract: We begin by reiterating that common neural network activation functions have simple Bayesian origins. In this spirit, we go on to show that Bayes's theorem also implies a simple recurrence relation; this leads to a Bayesian recurrent unit with a prescribed feedback formulation. We show that introduction of a context indicator leads to a variable feedback that is similar to the forget mechanism in… ▽ More

    Submitted 20 April, 2020; v1 submitted 24 October, 2019; originally announced October 2019.

  15. arXiv:1806.08685  [pdf, other

    eess.AS cs.SD

    A Variational Prosody Model for Mapping the Context-Sensitive Variation of Functional Prosodic Prototypes

    Authors: Branislav Gerazov, Gérard Bailly, Omar Mohammed, Yi Xu, Philip N. Garner

    Abstract: The quest for comprehensive generative models of intonation that link linguistic and paralinguistic functions to prosodic forms has been a longstanding challenge of speech communication research. Traditional intonation models have given way to the overwhelming performance of deep learning (DL) techniques for training general purpose end-to-end mappings using millions of tunable parameters. The shi… ▽ More

    Submitted 18 March, 2019; v1 submitted 22 June, 2018; originally announced June 2018.

    Comments: Updated with recurrent version of contour generators, unified prosodic latent space, and performance evaluation with baseline

  16. arXiv:1711.10025  [pdf, other

    eess.AS cs.SD

    Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

    Authors: Sibo Tong, Philip N. Garner, Hervé Bourlard

    Abstract: Multilingual models for Automatic Speech Recognition (ASR) are attractive as they have been shown to benefit from more training data, and better lend themselves to adaptation to under-resourced languages. However, initialisation from monolingual context-dependent models leads to an explosion of context-dependent states. Connectionist Temporal Classification (CTC) is a potential solution to this as… ▽ More

    Submitted 23 January, 2018; v1 submitted 27 November, 2017; originally announced November 2017.

  17. Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding

    Authors: Milos Cernak, Alexandros Lazaridis, Afsaneh Asaei, Philip N. Garner

    Abstract: Most current very low bit rate (VLBR) speech coding systems use hidden Markov model (HMM) based speech recognition/synthesis techniques. This allows transmission of information (such as phonemes) segment by segment that decreases the bit rate. However, the encoder based on a phoneme speech recognition may create bursts of segmental errors. Segmental errors are further propagated to optional supras… ▽ More

    Submitted 29 August, 2016; v1 submitted 15 April, 2016; originally announced April 2016.

    Report number: Idiap-RR-11-2016

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, Volume: 24, Issue: 12, Dec. 2016

  18. arXiv:1409.0203  [pdf, other

    cs.SD cs.LG

    Ad Hoc Microphone Array Calibration: Euclidean Distance Matrix Completion Algorithm and Theoretical Guarantees

    Authors: Mohammad J. Taghizadeh, Reza Parhizkar, Philip N. Garner, Herve Bourlard, Afsaneh Asaei

    Abstract: This paper addresses the problem of ad hoc microphone array calibration where only partial information about the distances between microphones is available. We construct a matrix consisting of the pairwise distances and propose to estimate the missing entries based on a novel Euclidean distance matrix completion algorithm by alternative low-rank matrix completion and projection onto the Euclidean… ▽ More

    Submitted 31 August, 2014; originally announced September 2014.

    Comments: In Press, available online, August 1, 2014. http://www.sciencedirect.com/science/article/pii/S0165168414003508, Signal Processing, 2014