Skip to main content

Showing 1–8 of 8 results for author: Ravenscroft, W

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.08914  [pdf, other

    cs.SD cs.LG eess.AS

    Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition

    Authors: William Ravenscroft, George Close, Stefan Goetze, Thomas Hain, Mohammad Soleymanpour, Anurag Chowdhury, Mark C. Fuhs

    Abstract: One solution to automatic speech recognition (ASR) of overlapping speakers is to separate speech and then perform ASR on the separated signals. Commonly, the separator produces artefacts which often degrade ASR performance. Addressing this issue typically requires reference transcriptions to jointly train the separation and ASR networks. This is often not viable for training on real-world in-domai… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 5 pages, 3 Figures, 3 Tables, Accepted for Interspeech 2024

  2. arXiv:2312.08979  [pdf, ps, other

    cs.SD eess.AS

    Multi-CMGAN+/+: Leveraging Multi-Objective Speech Quality Metric Prediction for Speech Enhancement

    Authors: George Close, William Ravenscroft, Thomas Hain, Stefan Goetze

    Abstract: Neural network based approaches to speech enhancement have shown to be particularly powerful, being able to leverage a data-driven approach to result in a significant performance gain versus other approaches. Such approaches are reliant on artificially created labelled training data such that the neural model can be trained using intrusive loss functions which compare the output of the model with… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted @ ICASSP 2024

  3. arXiv:2310.06125  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments

    Authors: William Ravenscroft, Stefan Goetze, Thomas Hain

    Abstract: Speech separation remains an important topic for multi-speaker technology researchers. Convolution augmented transformers (conformers) have performed well for many speech processing tasks but have been under-researched for speech separation. Most recent state-of-the-art (SOTA) separation models have been time-domain audio separation networks (TasNets). A number of successful models have made use o… ▽ More

    Submitted 9 October, 2023; originally announced October 2023.

    Comments: Accepted at ASRU Workshop 2023

  4. arXiv:2304.07142  [pdf, other

    cs.SD cs.AI cs.LG cs.NE eess.AS

    On Data Sampling Strategies for Training Neural Network Speech Separation Models

    Authors: William Ravenscroft, Stefan Goetze, Thomas Hain

    Abstract: Speech separation remains an important area of multi-speaker signal processing. Deep neural network (DNN) models have attained the best performance on many speech separation benchmarks. Some of these models can take significant time to train and have high memory requirements. Previous work has proposed shortening training examples to address these issues but the impact of this on model performance… ▽ More

    Submitted 16 June, 2023; v1 submitted 14 April, 2023; originally announced April 2023.

    Comments: Accepted for EUSIPCO 2023

  5. arXiv:2301.04388  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Perceive and predict: self-supervised speech representation based loss functions for speech enhancement

    Authors: George Close, William Ravenscroft, Thomas Hain, Stefan Goetze

    Abstract: Recent work in the domain of speech enhancement has explored the use of self-supervised speech representations to aid in the training of neural speech enhancement models. However, much of this work focuses on using the deepest or final outputs of self supervised speech representation models, rather than the earlier feature encodings. The use of self supervised representations in such a way is ofte… ▽ More

    Submitted 26 June, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

    Comments: 4 pages, accepted at ICASSP 2023

    Journal ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  6. arXiv:2210.15305  [pdf, other

    cs.SD cs.AI eess.AS

    Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation

    Authors: William Ravenscroft, Stefan Goetze, Thomas Hain

    Abstract: Speech separation models are used for isolating individual speakers in many speech processing applications. Deep learning models have been shown to lead to state-of-the-art (SOTA) results on a number of speech separation benchmarks. One such class of models known as temporal convolutional networks (TCNs) has shown promising results for speech separation tasks. A limitation of these models is that… ▽ More

    Submitted 10 March, 2023; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Accepted for ICASSP 2023

  7. arXiv:2205.08455  [pdf, other

    cs.SD cs.LG eess.AS

    Utterance Weighted Multi-Dilation Temporal Convolutional Networks for Monaural Speech Dereverberation

    Authors: William Ravenscroft, Stefan Goetze, Thomas Hain

    Abstract: Speech dereverberation is an important stage in many speech technology applications. Recent work in this area has been dominated by deep neural network models. Temporal convolutional networks (TCNs) are deep learning models that have been proposed for sequence modelling in the task of dereverberating speech. In this work a weighted multi-dilation depthwise-separable convolution is proposed to repl… ▽ More

    Submitted 22 July, 2022; v1 submitted 17 May, 2022; originally announced May 2022.

    Comments: Accepted at IWAENC 2022

  8. arXiv:2204.06439  [pdf, other

    cs.SD cs.LG eess.AS

    Receptive Field Analysis of Temporal Convolutional Networks for Monaural Speech Dereverberation

    Authors: William Ravenscroft, Stefan Goetze, Thomas Hain

    Abstract: Speech dereverberation is often an important requirement in robust speech processing tasks. Supervised deep learning (DL) models give state-of-the-art performance for single-channel speech dereverberation. Temporal convolutional networks (TCNs) are commonly used for sequence modelling in speech enhancement tasks. A feature of TCNs is that they have a receptive field (RF) dependent on the specific… ▽ More

    Submitted 1 July, 2022; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Accepted at EUSIPCO 2022