-
Off to new Shores: A Dataset & Benchmark for (near-)coastal Flood Inundation Forecasting
Authors:
Brandon Victor,
Mathilde Letard,
Peter Naylor,
Karim Douch,
Nicolas Longépé,
Zhen He,
Patrick Ebel
Abstract:
Floods are among the most common and devastating natural hazards, imposing immense costs on our society and economy due to their disastrous consequences. Recent progress in weather prediction and spaceborne flood mapping demonstrated the feasibility of anticipating extreme events and reliably detecting their catastrophic effects afterwards. However, these efforts are rarely linked to one another a…
▽ More
Floods are among the most common and devastating natural hazards, imposing immense costs on our society and economy due to their disastrous consequences. Recent progress in weather prediction and spaceborne flood mapping demonstrated the feasibility of anticipating extreme events and reliably detecting their catastrophic effects afterwards. However, these efforts are rarely linked to one another and there is a critical lack of datasets and benchmarks to enable the direct forecasting of flood extent. To resolve this issue, we curate a novel dataset enabling a timely prediction of flood extent. Furthermore, we provide a representative evaluation of state-of-the-art methods, structured into two benchmark tracks for forecasting flood inundation maps i) in general and ii) focused on coastal regions. Altogether, our dataset and benchmark provide a comprehensive platform for evaluating flood forecasts, enabling future solutions for this critical challenge. Data, code & models are shared at https://github.com/Multihuntr/GFF under a CC0 license.
△ Less
Submitted 27 September, 2024;
originally announced September 2024.
-
Zero Shot Text to Speech Augmentation for Automatic Speech Recognition on Low-Resource Accented Speech Corpora
Authors:
Francesco Nespoli,
Daniel Barreda,
Patrick A. Naylor
Abstract:
In recent years, automatic speech recognition (ASR) models greatly improved transcription performance both in clean, low noise, acoustic conditions and in reverberant environments. However, all these systems rely on the availability of hundreds of hours of labelled training data in specific acoustic conditions. When such a training dataset is not available, the performance of the system is heavily…
▽ More
In recent years, automatic speech recognition (ASR) models greatly improved transcription performance both in clean, low noise, acoustic conditions and in reverberant environments. However, all these systems rely on the availability of hundreds of hours of labelled training data in specific acoustic conditions. When such a training dataset is not available, the performance of the system is heavily impacted. For example, this happens when a specific acoustic environment or a particular population of speakers is under-represented in the training dataset. Specifically, in this paper we investigate the effect of accented speech data on an off-the-shelf ASR system. Furthermore, we suggest a strategy based on zero-shot text-to-speech to augment the accented speech corpora. We show that this augmentation method is able to mitigate the loss in performance of the ASR system on accented data up to 5% word error rate reduction (WERR). In conclusion, we demonstrate that by incorporating a modest fraction of real with synthetically generated data, the ASR system exhibits superior performance compared to a model trained exclusively on authentic accented speech with up to 14% WERR.
△ Less
Submitted 17 September, 2024;
originally announced September 2024.
-
XANE Background Acoustic Embeddings: Ablation and Clustering Analysis
Authors:
Dushyant Sharma,
James Fosburgh,
Sri Harsha Dumpala,
Chandramouli Shama Sastri,
Stanislav Yu. Kruchinin,
Patrick A. Naylor
Abstract:
We explore the recently proposed explainable acoustic neural embedding~(XANE) system that models the background acoustics of a speech signal in a non-intrusive manner. The XANE embeddings are used to estimate specific parameters related to the background acoustic properties of the signal which allows the embeddings to be explainable in terms of those parameters. We perform ablation studies on the…
▽ More
We explore the recently proposed explainable acoustic neural embedding~(XANE) system that models the background acoustics of a speech signal in a non-intrusive manner. The XANE embeddings are used to estimate specific parameters related to the background acoustic properties of the signal which allows the embeddings to be explainable in terms of those parameters. We perform ablation studies on the XANE system and show that estimating all acoustic parameters jointly has an overall positive effect. Furthermore, we illustrate the value of XANE embeddings by performing clustering experiments on unseen test data and show that the proposed embeddings achieve a mean F1 score of 92\% for three different tasks, outperforming significantly the WavLM based signal embeddings and are complimentary to speaker embeddings.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Exotic definite four-manifolds with non-cyclic fundamental group
Authors:
Robert Harris,
Patrick Naylor,
B. Doug Park
Abstract:
We construct infinitely many pairwise non-diffeomorphic smooth structures on a definite $4$-manifold with non-cyclic fundamental group $\mathbb{Z}/2\times \mathbb{Z}/2$.
We construct infinitely many pairwise non-diffeomorphic smooth structures on a definite $4$-manifold with non-cyclic fundamental group $\mathbb{Z}/2\times \mathbb{Z}/2$.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
XANE: eXplainable Acoustic Neural Embeddings
Authors:
Sri Harsha Dumpala,
Dushyant Sharma,
Chandramouli Shama Sastri,
Stanislav Kruchinin,
James Fosburgh,
Patrick A. Naylor
Abstract:
We present a novel method for extracting neural embeddings that model the background acoustics of a speech signal. The extracted embeddings are used to estimate specific parameters related to the background acoustic properties of the signal in a non-intrusive manner, which allows the embeddings to be explainable in terms of those parameters. We illustrate the value of these embeddings by performin…
▽ More
We present a novel method for extracting neural embeddings that model the background acoustics of a speech signal. The extracted embeddings are used to estimate specific parameters related to the background acoustic properties of the signal in a non-intrusive manner, which allows the embeddings to be explainable in terms of those parameters. We illustrate the value of these embeddings by performing clustering experiments on unseen test data and show that the proposed embeddings achieve a mean F1 score of 95.2\% for three different tasks, outperforming significantly the WavLM based signal embeddings. We also show that the proposed method can explain the embeddings by estimating 14 acoustic parameters characterizing the background acoustics, including reverberation and noise levels, overlapped speech detection, CODEC type detection and noise type detection with high accuracy and a real-time factor 17 times lower than an external baseline method.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Steered Response Power for Sound Source Localization: A Tutorial Review
Authors:
Eric Grinstein,
Elisa Tengan,
Bilgesu Çakmak,
Thomas Dietzen,
Leonardo Nunes,
Toon van Waterschoot,
Mike Brookes,
Patrick A. Naylor
Abstract:
In the last three decades, the Steered Response Power (SRP) method has been widely used for the task of Sound Source Localization (SSL), due to its satisfactory localization performance on moderately reverberant and noisy scenarios. Many works have analyzed and extended the original SRP method to reduce its computational cost, to allow it to locate multiple sources, or to improve its performance i…
▽ More
In the last three decades, the Steered Response Power (SRP) method has been widely used for the task of Sound Source Localization (SSL), due to its satisfactory localization performance on moderately reverberant and noisy scenarios. Many works have analyzed and extended the original SRP method to reduce its computational cost, to allow it to locate multiple sources, or to improve its performance in adverse environments. In this work, we review over 200 papers on the SRP method and its variants, with emphasis on the SRP-PHAT method. We also present eXtensible-SRP, or X-SRP, a generalized and modularized version of the SRP algorithm which allows the reviewed extensions to be implemented. We provide a Python implementation of the algorithm which includes selected extensions from the literature.
△ Less
Submitted 9 May, 2024; v1 submitted 5 May, 2024;
originally announced May 2024.
-
Implicit Assimilation of Sparse In Situ Data for Dense & Global Storm Surge Forecasting
Authors:
Patrick Ebel,
Brandon Victor,
Peter Naylor,
Gabriele Meoni,
Federico Serva,
Rochelle Schneider
Abstract:
Hurricanes and coastal floods are among the most disastrous natural hazards. Both are intimately related to storm surges, as their causes and effects, respectively. However, the short-term forecasting of storm surges has proven challenging, especially when targeting previously unseen locations or sites without tidal gauges. Furthermore, recent work improved short and medium-term weather forecastin…
▽ More
Hurricanes and coastal floods are among the most disastrous natural hazards. Both are intimately related to storm surges, as their causes and effects, respectively. However, the short-term forecasting of storm surges has proven challenging, especially when targeting previously unseen locations or sites without tidal gauges. Furthermore, recent work improved short and medium-term weather forecasting but the handling of raw unassimilated data remains non-trivial. In this paper, we tackle both challenges and demonstrate that neural networks can implicitly assimilate sparse in situ tide gauge data with coarse ocean state reanalysis in order to forecast storm surges. We curate a global dataset to learn and validate the dense prediction of storm surges, building on preceding efforts. Other than prior work limited to known gauges, our approach extends to ungauged sites, paving the way for global storm surge forecasting.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.
-
The Neural-SRP method for positional sound source localization
Authors:
Eric Grinstein,
Toon van Waterschoot,
Mike Brookes,
Patrick A. Naylor
Abstract:
Steered Response Power (SRP) is a widely used method for the task of sound source localization using microphone arrays, showing satisfactory localization performance on many practical scenarios. However, its performance is diminished under highly reverberant environments. Although Deep Neural Networks (DNNs) have been previously proposed to overcome this limitation, most are trained for a specific…
▽ More
Steered Response Power (SRP) is a widely used method for the task of sound source localization using microphone arrays, showing satisfactory localization performance on many practical scenarios. However, its performance is diminished under highly reverberant environments. Although Deep Neural Networks (DNNs) have been previously proposed to overcome this limitation, most are trained for a specific number of microphones with fixed spatial coordinates. This restricts their practical application on scenarios frequently observed in wireless acoustic sensor networks, where each application has an ad-hoc microphone topology. We propose Neural-SRP, a DNN which combines the flexibility of SRP with the performance gains of DNNs. We train our network using simulated data and transfer learning, and evaluate our approach on recorded and simulated data. Results verify that Neural-SRP's localization performance significantly outperforms the baselines.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Binaural Speech Enhancement Using Deep Complex Convolutional Transformer Networks
Authors:
Vikas Tokala,
Eric Grinstein,
Mike Brookes,
Simon Doclo,
Jesper Jensen,
Patrick A. Naylor
Abstract:
Studies have shown that in noisy acoustic environments, providing binaural signals to the user of an assistive listening device may improve speech intelligibility and spatial awareness. This paper presents a binaural speech enhancement method using a complex convolutional neural network with an encoder-decoder architecture and a complex multi-head attention transformer. The model is trained to est…
▽ More
Studies have shown that in noisy acoustic environments, providing binaural signals to the user of an assistive listening device may improve speech intelligibility and spatial awareness. This paper presents a binaural speech enhancement method using a complex convolutional neural network with an encoder-decoder architecture and a complex multi-head attention transformer. The model is trained to estimate individual complex ratio masks in the time-frequency domain for the left and right-ear channels of binaural hearing devices. The model is trained using a novel loss function that incorporates the preservation of spatial information along with speech intelligibility improvement and noise reduction. Simulation results for acoustic scenarios with a single target speaker and isotropic noise of various types show that the proposed method improves the estimated binaural speech intelligibility and preserves the binaural cues better in comparison with several baseline algorithms.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Uncertainty Quantification in Machine Learning for Joint Speaker Diarization and Identification
Authors:
Simon W. McKnight,
Aidan O. T. Hogg,
Vincent W. Neo,
Patrick A. Naylor
Abstract:
This paper studies modulation spectrum features ($Φ$) and mel-frequency cepstral coefficients ($Ψ$) in joint speaker diarization and identification (JSID). JSID is important as speaker diarization on its own to distinguish speakers is insufficient for many applications, it is often necessary to identify speakers as well. Machine learning models are set up using convolutional neural networks (CNNs)…
▽ More
This paper studies modulation spectrum features ($Φ$) and mel-frequency cepstral coefficients ($Ψ$) in joint speaker diarization and identification (JSID). JSID is important as speaker diarization on its own to distinguish speakers is insufficient for many applications, it is often necessary to identify speakers as well. Machine learning models are set up using convolutional neural networks (CNNs) on $Φ$ and recurrent neural networks $\unicode{x2013}$ long short-term memory (LSTMs) on $Ψ$, then concatenating into fully connected layers.
Experiment 1 shows models on both $Φ$ and $Ψ$ have better diarization error rates (DERs) than models on either alone; a CNN on $Φ$ has DER 29.09\%, compared to 27.78\% for a LSTM on $Ψ$ and 19.44\% for a model on both. Experiment 1 also investigates aleatoric uncertainties and shows the model on both $Φ$ and $Ψ$ has mean entropy 0.927~bits (out of 4~bits) for correct predictions compared to 1.896~bits for incorrect predictions which, along with entropy histogram shapes, shows the model helpfully indicates where it is uncertain.
Experiment 2 investigates epistemic uncertainties as well as aleatoric using Monte Carlo dropout (MCD). It compares models on both $Φ$ and $Ψ$ with models trained on x-vectors ($X$), before applying Kalman filter smoothing on epistemic uncertainties for resegmentation and model ensembles. While the two models on $X$ (DERs 10.23\% and 9.74\%) outperform those on $Φ$ and $Ψ$ (DER 17.85\%) after their individual Kalman filter smoothing, combining them using a Kalman filter smoothing method improves the DER to 9.29\%. Aleatoric uncertainties are higher for incorrect predictions.
Both Experiments show models on $Φ$ do not distinguish overlapping speakers as well as anticipated. However, Experiment 2 shows model ensembles do better with overlapping speakers than individual models do.
△ Less
Submitted 30 December, 2023; v1 submitted 27 December, 2023;
originally announced December 2023.
-
Subspace Hybrid MVDR Beamforming for Augmented Hearing
Authors:
Sina Hafezi,
Alastair H. Moore,
Pierre H. Guiraud,
Patrick A. Naylor,
Jacob Donley,
Vladimir Tourbabin,
Thomas Lunner
Abstract:
Signal-dependent beamformers are advantageous over signal-independent beamformers when the acoustic scenario - be it real-world or simulated - is straightforward in terms of the number of sound sources, the ambient sound field and their dynamics. However, in the context of augmented reality audio using head-worn microphone arrays, the acoustic scenarios encountered are often far from straightforwa…
▽ More
Signal-dependent beamformers are advantageous over signal-independent beamformers when the acoustic scenario - be it real-world or simulated - is straightforward in terms of the number of sound sources, the ambient sound field and their dynamics. However, in the context of augmented reality audio using head-worn microphone arrays, the acoustic scenarios encountered are often far from straightforward. The design of robust, high-performance, adaptive beamformers for such scenarios is an on-going challenge. This is due to the violation of the typically required assumptions on the noise field caused by, for example, rapid variations resulting from complex acoustic environments, and/or rotations of the listener's head. This work proposes a multi-channel speech enhancement algorithm which utilises the adaptability of signal-dependent beamformers while still benefiting from the computational efficiency and robust performance of signal-independent super-directive beamformers. The algorithm has two stages. (i) The first stage is a hybrid beamformer based on a dictionary of weights corresponding to a set of noise field models. (ii) The second stage is a wide-band subspace post-filter to remove any artifacts resulting from (i). The algorithm is evaluated using both real-world recordings and simulations of a cocktail-party scenario. Noise suppression, intelligibility and speech quality results show a significant performance improvement by the proposed algorithm compared to the baseline super-directive beamformer. A data-driven implementation of the noise field dictionary is shown to provide more noise suppression, and similar speech intelligibility and quality, compared to a parametric dictionary.
△ Less
Submitted 30 November, 2023;
originally announced November 2023.
-
Deep-Learning-based Change Detection with Spaceborne Hyperspectral PRISMA data
Authors:
J. F. Amieva,
A. Austoni,
M. A. Brovelli,
L. Ansalone,
P. Naylor,
F. Serva,
B. Le Saux
Abstract:
Change detection (CD) methods have been applied to optical data for decades, while the use of hyperspectral data with a fine spectral resolution has been rarely explored. CD is applied in several sectors, such as environmental monitoring and disaster management. Thanks to the PRecursore IperSpettrale della Missione operativA (PRISMA), hyperspectral-from-space CD is now possible. In this work, we a…
▽ More
Change detection (CD) methods have been applied to optical data for decades, while the use of hyperspectral data with a fine spectral resolution has been rarely explored. CD is applied in several sectors, such as environmental monitoring and disaster management. Thanks to the PRecursore IperSpettrale della Missione operativA (PRISMA), hyperspectral-from-space CD is now possible. In this work, we apply standard and deep-learning (DL) CD methods to different targets, from natural to urban areas. We propose a pipeline starting from coregistration, followed by CD with a full-spectrum algorithm and by a DL network developed for optical data. We find that changes in vegetation and built environments are well captured. The spectral information is valuable to identify subtle changes and the DL methods are less affected by noise compared to the statistical method, but atmospheric effects and the lack of reliable ground truth represent a major challenge to hyperspectral CD.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Dual input neural networks for positional sound source localization
Authors:
Eric Grinstein,
Vincent W. Neo,
Patrick A. Naylor
Abstract:
In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) algorithms, information from a high dimensional, multichannel audio signals received by many distributed microphones is combined with information describing acoustic properties of the scene, s…
▽ More
In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) algorithms, information from a high dimensional, multichannel audio signals received by many distributed microphones is combined with information describing acoustic properties of the scene, such as the microphones' coordinates in space, to estimate the position of a sound source. We introduce Dual Input Neural Networks (DI-NNs) as a simple and effective way to model these two data types in a neural network. We train and evaluate our proposed DI-NN on scenarios of varying difficulty and realism and compare it against an alternative architecture, a classical Least-Squares (LS) method as well as a classical Convolutional Recurrent Neural Network (CRNN). Our results show that the DI-NN significantly outperforms the baselines, achieving a five times lower localization error than the LS method and two times lower than the CRNN in a test dataset of real recordings.
△ Less
Submitted 8 August, 2023;
originally announced August 2023.
-
Implicit neural representation for change detection
Authors:
Peter Naylor,
Diego Di Carlo,
Arianna Traviglia,
Makoto Yamada,
Marco Fiorucci
Abstract:
Identifying changes in a pair of 3D aerial LiDAR point clouds, obtained during two distinct time periods over the same geographic region presents a significant challenge due to the disparities in spatial coverage and the presence of noise in the acquisition system. The most commonly used approaches to detecting changes in point clouds are based on supervised methods which necessitate extensive lab…
▽ More
Identifying changes in a pair of 3D aerial LiDAR point clouds, obtained during two distinct time periods over the same geographic region presents a significant challenge due to the disparities in spatial coverage and the presence of noise in the acquisition system. The most commonly used approaches to detecting changes in point clouds are based on supervised methods which necessitate extensive labelled data often unavailable in real-world applications. To address these issues, we propose an unsupervised approach that comprises two components: Implicit Neural Representation (INR) for continuous shape reconstruction and a Gaussian Mixture Model for categorising changes. INR offers a grid-agnostic representation for encoding bi-temporal point clouds, with unmatched spatial support that can be regularised to enhance high-frequency details and reduce noise. The reconstructions at each timestamp are compared at arbitrary spatial scales, leading to a significant increase in detection capabilities. We apply our method to a benchmark dataset comprising simulated LiDAR point clouds for urban sprawling. This dataset encompasses diverse challenging scenarios, varying in resolutions, input modalities and noise levels. This enables a comprehensive multi-scenario evaluation, comparing our method with the current state-of-the-art approach. We outperform the previous methods by a margin of 10% in the intersection over union metric. In addition, we put our techniques to practical use by applying them in a real-world scenario to identify instances of illicit excavation of archaeological sites and validate our results by comparing them with findings from field experts.
△ Less
Submitted 30 August, 2023; v1 submitted 28 July, 2023;
originally announced July 2023.
-
Doubles of Gluck twists: a five dimensional approach
Authors:
David Gabai,
Patrick Naylor,
Hannah Schwartz
Abstract:
Using a 5-dimensional perspective, we balance algebraic and geometric handle cancellation to show that doubles of Gluck twists of certain 2-spheres with two minima are standard. This includes all 2-spheres which are unions of ribbon discs, one of which has undisking number one. As an application, we produce new examples of Schoenflies balls not known to be standard.
Using a 5-dimensional perspective, we balance algebraic and geometric handle cancellation to show that doubles of Gluck twists of certain 2-spheres with two minima are standard. This includes all 2-spheres which are unions of ribbon discs, one of which has undisking number one. As an application, we produce new examples of Schoenflies balls not known to be standard.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Graph neural networks for sound source localization on distributed microphone networks
Authors:
Eric Grinstein,
Mike Brookes,
Patrick A. Naylor
Abstract:
Distributed Microphone Arrays (DMAs) present many challenges with respect to centralized microphone arrays. An important requirement of applications on these arrays is handling a variable number of input channels. We consider the use of Graph Neural Networks (GNNs) as a solution to this challenge. We present a localization method using the Relation Network GNN, which we show shares many similariti…
▽ More
Distributed Microphone Arrays (DMAs) present many challenges with respect to centralized microphone arrays. An important requirement of applications on these arrays is handling a variable number of input channels. We consider the use of Graph Neural Networks (GNNs) as a solution to this challenge. We present a localization method using the Relation Network GNN, which we show shares many similarities to classical signal processing algorithms for Sound Source Localization (SSL). We apply our method for the task of SSL and validate it experimentally using an unseen number of microphones. We test different feature extractors and show that our approach significantly outperforms classical baselines.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Long-term Conversation Analysis: Exploring Utility and Privacy
Authors:
Francesco Nespoli,
Jule Pohlhausen,
Patrick A. Naylor,
Joerg Bitzer
Abstract:
The analysis of conversations recorded in everyday life requires privacy protection. In this contribution, we explore a privacy-preserving feature extraction method based on input feature dimension reduction, spectral smoothing and the low-cost speaker anonymization technique based on McAdams coefficient. We assess the utility of the feature extraction methods with a voice activity detection and a…
▽ More
The analysis of conversations recorded in everyday life requires privacy protection. In this contribution, we explore a privacy-preserving feature extraction method based on input feature dimension reduction, spectral smoothing and the low-cost speaker anonymization technique based on McAdams coefficient. We assess the utility of the feature extraction methods with a voice activity detection and a speaker diarization system, while privacy protection is determined with a speech recognition and a speaker verification model. We show that the combination of McAdams coefficient and spectral smoothing maintains the utility while improving privacy.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Two-Stage Voice Anonymization for Enhanced Privacy
Authors:
Francesco Nespoli,
Daniel Barreda,
Joerg Bitzer,
Patrick A. Naylor
Abstract:
In recent years, the need for privacy preservation when manipulating or storing personal data, including speech , has become a major issue. In this paper, we present a system addressing the speaker-level anonymization problem. We propose and evaluate a two-stage anonymization pipeline exploiting a state-of-the-art anonymization model described in the Voice Privacy Challenge 2022 in combination wit…
▽ More
In recent years, the need for privacy preservation when manipulating or storing personal data, including speech , has become a major issue. In this paper, we present a system addressing the speaker-level anonymization problem. We propose and evaluate a two-stage anonymization pipeline exploiting a state-of-the-art anonymization model described in the Voice Privacy Challenge 2022 in combination with a zero-shot voice conversion architecture able to capture speaker characteristics from a few seconds of speech. We show this architecture can lead to strong privacy preservation while preserving pitch information. Finally, we propose a new compressed metric to evaluate anonymization systems in privacy scenarios with different constraints on privacy and utility.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Subspace Hybrid Beamforming for Head-worn Microphone Arrays
Authors:
Sina Hafezi,
Alastair H. Moore,
Pierre Guiraud,
Patrick A. Naylor,
Jacob Donley,
Vladimir Tourbabin,
Thomas Lunner
Abstract:
A two-stage multi-channel speech enhancement method is proposed which consists of a novel adaptive beamformer, Hybrid Minimum Variance Distortionless Response (MVDR), Isotropic-MVDR (Iso), and a novel multi-channel spectral Principal Components Analysis (PCA) denoising. In the first stage, the Hybrid-MVDR performs multiple MVDRs using a dictionary of pre-defined noise field models and picks the mi…
▽ More
A two-stage multi-channel speech enhancement method is proposed which consists of a novel adaptive beamformer, Hybrid Minimum Variance Distortionless Response (MVDR), Isotropic-MVDR (Iso), and a novel multi-channel spectral Principal Components Analysis (PCA) denoising. In the first stage, the Hybrid-MVDR performs multiple MVDRs using a dictionary of pre-defined noise field models and picks the minimum-power outcome, which benefits from the robustness of signal-independent beamforming and the performance of adaptive beamforming. In the second stage, the outcomes of Hybrid and Iso are jointly used in a two-channel PCA-based denoising to remove the 'musical noise' produced by Hybrid beamformer. On a dataset of real 'cocktail-party' recordings with head-worn array, the proposed method outperforms the baseline superdirective beamformer in noise suppression (fwSegSNR, SDR, SIR, SAR) and speech intelligibility (STOI) with similar speech quality (PESQ) improvement.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Optimal Transport for Change Detection on LiDAR Point Clouds
Authors:
Marco Fiorucci,
Peter Naylor,
Makoto Yamada
Abstract:
Unsupervised change detection between airborne LiDAR data points, taken at separate times over the same location, can be difficult due to unmatching spatial support and noise from the acquisition system. Most current approaches to detect changes in point clouds rely heavily on the computation of Digital Elevation Models (DEM) images and supervised methods. Obtaining a DEM leads to LiDAR informatio…
▽ More
Unsupervised change detection between airborne LiDAR data points, taken at separate times over the same location, can be difficult due to unmatching spatial support and noise from the acquisition system. Most current approaches to detect changes in point clouds rely heavily on the computation of Digital Elevation Models (DEM) images and supervised methods. Obtaining a DEM leads to LiDAR informational loss due to pixelisation, and supervision requires large amounts of labelled data often unavailable in real-world scenarios. We propose an unsupervised approach based on the computation of the transport of 3D LiDAR points over two temporal supports. The method is based on unbalanced optimal transport and can be generalised to any change detection problem with LiDAR data. We apply our approach to publicly available datasets for monitoring urban sprawling in various noise and resolution configurations that mimic several sensors used in practice. Our method allows for unsupervised multi-class classification and outperforms the previous state-of-the-art unsupervised approaches by a significant margin.
△ Less
Submitted 8 November, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
Relative Acoustic Features for Distance Estimation in Smart-Homes
Authors:
Francesco Nespoli,
Daniel Barreda,
Patrick A. Naylor
Abstract:
Any audio recording encapsulates the unique fingerprint of the associated acoustic environment, namely the background noise and reverberation. Considering the scenario of a room equipped with a fixed smart speaker device with one or more microphones and a wearable smart device (watch, glasses or smartphone), we employed the improved proportionate normalized least mean square adaptive filter to est…
▽ More
Any audio recording encapsulates the unique fingerprint of the associated acoustic environment, namely the background noise and reverberation. Considering the scenario of a room equipped with a fixed smart speaker device with one or more microphones and a wearable smart device (watch, glasses or smartphone), we employed the improved proportionate normalized least mean square adaptive filter to estimate the relative room impulse response mapping the audio recordings of the two devices. We performed inter-device distance estimation by exploiting a new set of features obtained extending the definition of some acoustic attributes of the room impulse response to its relative version. In combination with the sparseness measure of the estimated relative room impulse response, the relative features allow precise inter-device distance estimation which can be exploited for tasks such as best microphone selection or acoustic scene analysis. Experimental results from simulated rooms of different dimensions and reverberation times demonstrate the effectiveness of this computationally lightweight approach for smart home acoustic ranging applications
△ Less
Submitted 2 December, 2022;
originally announced December 2022.
-
Binaural Speech Enhancement Using STOI-Optimal Masks
Authors:
Vikas Tokala,
Mike Brookes,
Patrick A. Naylor
Abstract:
STOI-optimal masking has been previously proposed and developed for single-channel speech enhancement. In this paper, we consider the extension to the task of binaural speech enhancement in which spatial information is known to be important to speech understanding and therefore should be preserved by the enhancement processing. Masks are estimated for each of the binaural channels individually and…
▽ More
STOI-optimal masking has been previously proposed and developed for single-channel speech enhancement. In this paper, we consider the extension to the task of binaural speech enhancement in which spatial information is known to be important to speech understanding and therefore should be preserved by the enhancement processing. Masks are estimated for each of the binaural channels individually and a `better-ear listening' mask is computed by choosing the maximum of the two masks. The estimated mask is used to supply probability information about the speech presence in each time-frequency bin to an Optimally-modified Log Spectral Amplitude (OM-LSA) enhancer. We show that using the proposed method for binaural signals with a directional noise not only improves the SNR of the noisy signal but also preserves the binaural cues and intelligibility.
△ Less
Submitted 30 September, 2022;
originally announced September 2022.
-
Scale dependant layer for self-supervised nuclei encoding
Authors:
Peter Naylor,
Yao-Hung Hubert Tsai,
Marick Laé,
Makoto Yamada
Abstract:
Recent developments in self-supervised learning give us the possibility to further reduce human intervention in multi-step pipelines where the focus evolves around particular objects of interest. In the present paper, the focus lays in the nuclei in histopathology images. In particular we aim at extracting cellular information in an unsupervised manner for a downstream task. As nuclei present them…
▽ More
Recent developments in self-supervised learning give us the possibility to further reduce human intervention in multi-step pipelines where the focus evolves around particular objects of interest. In the present paper, the focus lays in the nuclei in histopathology images. In particular we aim at extracting cellular information in an unsupervised manner for a downstream task. As nuclei present themselves in a variety of sizes, we propose a new Scale-dependant convolutional layer to bypass scaling issues when resizing nuclei. On three nuclei datasets, we benchmark the following methods: handcrafted, pre-trained ResNet, supervised ResNet and self-supervised features. We show that the proposed convolution layer boosts performance and that this layer combined with Barlows-Twins allows for better nuclei encoding compared to the supervised paradigm in the low sample setting and outperforms all other proposed unsupervised methods. In addition, we extend the existing TNBC dataset to incorporate nuclei class annotation in order to enrich and publicly release a small sample setting dataset for nuclei segmentation and classification.
△ Less
Submitted 22 July, 2022;
originally announced July 2022.
-
Spatial Processing Front-End For Distant ASR Exploiting Self-Attention Channel Combinator
Authors:
Dushyant Sharma,
Rong Gong,
James Fosburgh,
Stanislav Yu. Kruchinin,
Patrick A. Naylor,
Ljubomir Milanovic
Abstract:
We present a novel multi-channel front-end based on channel shortening with theWeighted Prediction Error (WPE) method followed by a fixed MVDR beamformer used in combination with a recently proposed self-attention-based channel combination (SACC) scheme, for tackling the distant ASR problem. We show that the proposed system used as part of a ContextNet based end-to-end (E2E) ASR system outperforms…
▽ More
We present a novel multi-channel front-end based on channel shortening with theWeighted Prediction Error (WPE) method followed by a fixed MVDR beamformer used in combination with a recently proposed self-attention-based channel combination (SACC) scheme, for tackling the distant ASR problem. We show that the proposed system used as part of a ContextNet based end-to-end (E2E) ASR system outperforms leading ASR systems as demonstrated by a 21.6% reduction in relative WER on a multi-channel LibriSpeech playback dataset. We also show how dereverberation prior to beamforming is beneficial and compare the WPE method with a modified neural channel shortening approach. An analysis of the non-intrusive estimate of the signal C50 confirms that the 8 channel WPE method provides significant dereverberation of the signals (13.6 dB improvement). We also show how the weights of the SACC system allow the extraction of accurate spatial information which can be beneficial for other speech processing applications like diarization.
△ Less
Submitted 25 March, 2022;
originally announced March 2022.
-
Trisections of non-orientable 4-manifolds
Authors:
Maggie Miller,
Patrick Naylor
Abstract:
We study trisections of smooth, compact non-orientable 4-manifolds, and introduce trisections of non-orientable 4-manifolds with boundary. In particular, we prove a non-orientable analogue of a classical theorem of Laudenbach-Poénaru. As a consequence, trisection diagrams and Kirby diagrams of closed non-orientable 4-manifolds exist. We discuss how the theory of trisections may be adapted to the s…
▽ More
We study trisections of smooth, compact non-orientable 4-manifolds, and introduce trisections of non-orientable 4-manifolds with boundary. In particular, we prove a non-orientable analogue of a classical theorem of Laudenbach-Poénaru. As a consequence, trisection diagrams and Kirby diagrams of closed non-orientable 4-manifolds exist. We discuss how the theory of trisections may be adapted to the setting of non-orientable 4-manifolds with many examples.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
Multisections of 4-manifolds
Authors:
Gabriel Islambouli,
Patrick Naylor
Abstract:
We introduce multisections of smooth, closed 4-manifolds, which generalize trisections to decompositions with more than three pieces. This decomposition describes an arbitrary smooth, closed 4-manifold as a sequence of cut systems on a surface. We show how to carry out many smooth cut and paste operations in terms of these cut systems. In particular, we show how to implement a cork twist, whereby…
▽ More
We introduce multisections of smooth, closed 4-manifolds, which generalize trisections to decompositions with more than three pieces. This decomposition describes an arbitrary smooth, closed 4-manifold as a sequence of cut systems on a surface. We show how to carry out many smooth cut and paste operations in terms of these cut systems. In particular, we show how to implement a cork twist, whereby we show that an arbitrary exotic pair of smooth 4-manifolds admit 4-sections differing only by one cut system. By carrying out fiber sums and log transforms, we also show that the elliptic fibrations $E(n)_{p,q}$ all admit genus 3 multisections, and draw explicit diagrams for these manifolds.
△ Less
Submitted 6 October, 2020;
originally announced October 2020.
-
Gluck twisting roll spun knots
Authors:
Patrick Naylor,
Hannah Schwartz
Abstract:
We show that the smooth homotopy 4-sphere obtained by Gluck twisting the m-twist n-roll spin of any unknotting number one knot is diffeomorphic to the standard 4-sphere, for any pair of integers (m,n). It follows as a corollary that an infinite collection of twisted doubles of Gompf's infinite order corks are standard.
We show that the smooth homotopy 4-sphere obtained by Gluck twisting the m-twist n-roll spin of any unknotting number one knot is diffeomorphic to the standard 4-sphere, for any pair of integers (m,n). It follows as a corollary that an infinite collection of twisted doubles of Gompf's infinite order corks are standard.
△ Less
Submitted 11 September, 2020;
originally announced September 2020.
-
Time-Frequency Analysis and Parameterisation of Knee Sounds for Non-invasive Detection of Osteoarthritis
Authors:
Costas Yiallourides,
Patrick A. Naylor
Abstract:
Objective: In this work the potential of non-invasive detection of knee osteoarthritis is investigated using the sounds generated by the knee joint during walking. Methods: The information contained in the time-frequency domain of these signals and its compressed representations is exploited and their discriminant properties are studied. Their efficacy for the task of normal vs abnormal signal cla…
▽ More
Objective: In this work the potential of non-invasive detection of knee osteoarthritis is investigated using the sounds generated by the knee joint during walking. Methods: The information contained in the time-frequency domain of these signals and its compressed representations is exploited and their discriminant properties are studied. Their efficacy for the task of normal vs abnormal signal classification is evaluated using a comprehensive experimental framework. Based on this, the impact of the feature extraction parameters on the classification performance is investigated using Classification and Regression Trees (CART), Linear Discriminant Analysis (LDA) and Support Vector Machine (SVM) classifiers. Results: It is shown that classification is successful with an area under the Receiver Operating Characteristic (ROC) curve of 0.92. Conclusion: The analysis indicates improvements in classification performance when using non-uniform frequency scaling and identifies specific frequency bands that contain discriminative features. Significance: Contrary to other studies that focus on sit-to-stand movements and knee flexion/extension, this study used knee sounds obtained during walking. The analysis of such signals leads to non-invasive detection of knee osteoarthritis with high accuracy and could potentially extend the range of available tools for the assessment of the disease as a more practical and cost effective method without requiring clinical setups.
△ Less
Submitted 27 April, 2020;
originally announced April 2020.
-
Detection of Glottal Closure Instants from Speech Signals: a Quantitative Review
Authors:
Thomas Drugman,
Mark Thomas,
Jon Gudnason,
Patrick Naylor,
Thierry Dutoit
Abstract:
The pseudo-periodicity of voiced speech can be exploited in several speech processing applications. This requires however that the precise locations of the Glottal Closure Instants (GCIs) are available. The focus of this paper is the evaluation of automatic methods for the detection of GCIs directly from the speech waveform. Five state-of-the-art GCI detection algorithms are compared using six dif…
▽ More
The pseudo-periodicity of voiced speech can be exploited in several speech processing applications. This requires however that the precise locations of the Glottal Closure Instants (GCIs) are available. The focus of this paper is the evaluation of automatic methods for the detection of GCIs directly from the speech waveform. Five state-of-the-art GCI detection algorithms are compared using six different databases with contemporaneous electroglottographic recordings as ground truth, and containing many hours of speech by multiple speakers. The five techniques compared are the Hilbert Envelope-based detection (HE), the Zero Frequency Resonator-based method (ZFR), the Dynamic Programming Phase Slope Algorithm (DYPSA), the Speech Event Detection using the Residual Excitation And a Mean-based Signal (SEDREAMS) and the Yet Another GCI Algorithm (YAGA). The efficacy of these methods is first evaluated on clean speech, both in terms of reliabililty and accuracy. Their robustness to additive noise and to reverberation is also assessed. A further contribution of the paper is the evaluation of their performance on a concrete application of speech processing: the causal-anticausal decomposition of speech. It is shown that for clean speech, SEDREAMS and YAGA are the best performing techniques, both in terms of identification rate and accuracy. ZFR and SEDREAMS also show a superior robustness to additive noise and reverberation.
△ Less
Submitted 28 December, 2019;
originally announced January 2020.
-
The LOCATA Challenge: Acoustic Source Localization and Tracking
Authors:
Christine Evers,
Heinrich Loellmann,
Heinrich Mellmann,
Alexander Schmidt,
Hendrik Barfuss,
Patrick Naylor,
Walter Kellermann
Abstract:
The ability to localize and track acoustic events is a fundamental prerequisite for equipping machines with the ability to be aware of and engage with humans in their surrounding environment. However, in realistic scenarios, audio signals are adversely affected by reverberation, noise, interference, and periods of speech inactivity. In dynamic scenarios, where the sources and microphone platforms…
▽ More
The ability to localize and track acoustic events is a fundamental prerequisite for equipping machines with the ability to be aware of and engage with humans in their surrounding environment. However, in realistic scenarios, audio signals are adversely affected by reverberation, noise, interference, and periods of speech inactivity. In dynamic scenarios, where the sources and microphone platforms may be moving, the signals are additionally affected by variations in the source-sensor geometries. In practice, approaches to sound source localization and tracking are often impeded by missing estimates of active sources, estimation errors, as well as false estimates. The aim of the LOCAlization and TrAcking (LOCATA) Challenge is an open-access framework for the objective evaluation and benchmarking of broad classes of algorithms for sound source localization and tracking. This article provides a review of relevant localization and tracking algorithms and, within the context of the existing literature, a detailed evaluation and dissemination of the LOCATA submissions. The evaluation highlights achievements in the field, open challenges, and identifies potential future directions.
△ Less
Submitted 21 October, 2020; v1 submitted 3 September, 2019;
originally announced September 2019.
-
Trisection diagrams and twists of 4-manifolds
Authors:
Patrick Naylor
Abstract:
A theorem of Katanaga, Saeki, Teragaito, and Yamada relates Gluck and Price twists of 4-manifolds. Using trisection diagrams, we give a purely diagrammatic proof of this theorem, and answer a question of Kim and Miller.
A theorem of Katanaga, Saeki, Teragaito, and Yamada relates Gluck and Price twists of 4-manifolds. Using trisection diagrams, we give a purely diagrammatic proof of this theorem, and answer a question of Kim and Miller.
△ Less
Submitted 26 March, 2022; v1 submitted 4 June, 2019;
originally announced June 2019.
-
Detecting Sound-Absorbing Materials in a Room from a Single Impulse Response using a CRNN
Authors:
Constantinos Papayiannis,
Christine Evers,
Patrick A. Naylor
Abstract:
The materials of surfaces in a room play an important room in shaping the auditory experience within them. Different materials absorb energy at different levels. The level of absorption also varies across frequencies. This paper investigates how cues from a measured impulse response in the room can be exploited by machines to detect the materials present. With this motivation, this paper proposes…
▽ More
The materials of surfaces in a room play an important room in shaping the auditory experience within them. Different materials absorb energy at different levels. The level of absorption also varies across frequencies. This paper investigates how cues from a measured impulse response in the room can be exploited by machines to detect the materials present. With this motivation, this paper proposes a method for estimating the probability of presence of 10 material categories, based on their frequency-dependent absorption characteristics. The method is based on a CNN-RNN, trained as a multi-task classifier. The network is trained using a priori knowledge about the absorption characteristics of materials from the literature. In the experiments shown, the network is tested on over 5,00 impulse responses and 167 materials. The F1 score of the detections was 98%, with an even precision and recall. The method finds direct applications in architectural acoustics and in creating more parsimonious models for acoustic reflections.
△ Less
Submitted 27 October, 2019; v1 submitted 17 January, 2019;
originally announced January 2019.
-
Data Augmentation of Room Classifiers using Generative Adversarial Networks
Authors:
Constantinos Papayiannis,
Christine Evers,
Patrick A. Naylor
Abstract:
The classification of acoustic environments allows for machines to better understand the auditory world around them. The use of deep learning in order to teach machines to discriminate between different rooms is a new area of research. Similarly to other learning tasks, this task suffers from the high-dimensionality and the limited availability of training data. Data augmentation methods have prov…
▽ More
The classification of acoustic environments allows for machines to better understand the auditory world around them. The use of deep learning in order to teach machines to discriminate between different rooms is a new area of research. Similarly to other learning tasks, this task suffers from the high-dimensionality and the limited availability of training data. Data augmentation methods have proven useful in addressing this issue in the tasks of sound event detection and scene classification. This paper proposes a method for data augmentation for the task of room classification from reverberant speech. Generative Adversarial Networks (GANs) are trained that generate artificial data as if they were measured in real rooms. This provides additional training examples to the classifiers without the need for any additional data collection, which is time-consuming and often impractical. A representation of acoustic environments is proposed, which is used to train the GANs. The representation is based on a sparse model for the early reflections, a stochastic model for the reverberant tail and a mixing mechanism between the two. In the experiments shown, the proposed data augmentation method increases the test accuracy of a CNN-RNN room classifier from 89.4% to 95.5%.
△ Less
Submitted 4 December, 2020; v1 submitted 10 January, 2019;
originally announced January 2019.
-
End-to-End Classification of Reverberant Rooms using DNNs
Authors:
Constantinos Papayiannis,
Christine Evers,
Patrick A. Naylor
Abstract:
Reverberation is present in our workplaces, our homes, concert halls and theatres. This paper investigates how deep learning can use the effect of reverberation on speech to classify a recording in terms of the room in which it was recorded. Existing approaches in the literature rely on domain expertise to manually select acoustic parameters as inputs to classifiers. Estimation of these parameters…
▽ More
Reverberation is present in our workplaces, our homes, concert halls and theatres. This paper investigates how deep learning can use the effect of reverberation on speech to classify a recording in terms of the room in which it was recorded. Existing approaches in the literature rely on domain expertise to manually select acoustic parameters as inputs to classifiers. Estimation of these parameters from reverberant speech is adversely affected by estimation errors, impacting the classification accuracy. In order to overcome the limitations of previously proposed methods, this paper shows how DNNs can perform the classification by operating directly on reverberant speech spectra and a CRNN with an attention-mechanism is proposed for the task. The relationship is investigated between the reverberant speech representations learned by the DNNs and acoustic parameters. For evaluation, AIRs are used from the ACE-challenge dataset that were measured in 7 real rooms. The classification accuracy of the CRNN classifier in the experiments is 78% when using 5 hours of training data and 90% when using 10 hours.
△ Less
Submitted 1 November, 2020; v1 submitted 21 December, 2018;
originally announced December 2018.
-
Proceedings of the LOCATA Challenge Workshop -- a satellite event of IWAENC 2018
Authors:
Heinrich W. Loellmann,
Christine Evers,
Alexander Schmidt,
Hendrik Barfuss,
Patrick A. Naylor,
Walter Kellermann
Abstract:
Algorithms for acoustic source localization and tracking provide estimates of the positional information about active sound sources in acoustic environments and are essential for a wide range of applications such as personal assistants, smart homes, tele-conferencing systems, hearing aids, or autonomous systems. The aim of the IEEE-AASP Challenge on sound source localization and tracking (LOCATA)…
▽ More
Algorithms for acoustic source localization and tracking provide estimates of the positional information about active sound sources in acoustic environments and are essential for a wide range of applications such as personal assistants, smart homes, tele-conferencing systems, hearing aids, or autonomous systems. The aim of the IEEE-AASP Challenge on sound source localization and tracking (LOCATA) was to objectively benchmark state-of-the-art localization and tracking algorithms using an open-access data corpus of recordings for scenarios typically encountered in audio and acoustic signal processing applications. The challenge tasks ranged from the localization of a single source with a static microphone array to the tracking of multiple moving sources with a moving microphone array.
△ Less
Submitted 20 August, 2019; v1 submitted 20 November, 2018;
originally announced November 2018.
-
Acoustic Characterization of Environments (ACE) Challenge Results Technical Report
Authors:
James Eaton,
Nikolay D. Gaubitch,
Alastair H. Moore,
Patrick A. Naylor
Abstract:
This document provides the results of the tests of acoustic parameter estimation algorithms on the Acoustic Characterization of Environments (ACE) Challenge Evaluation dataset which were subsequently submitted and written up into papers for the Proceedings of the ACE Challenge. This document is supporting material for a forthcoming journal paper on the ACE Challenge which will provide further anal…
▽ More
This document provides the results of the tests of acoustic parameter estimation algorithms on the Acoustic Characterization of Environments (ACE) Challenge Evaluation dataset which were subsequently submitted and written up into papers for the Proceedings of the ACE Challenge. This document is supporting material for a forthcoming journal paper on the ACE Challenge which will provide further analysis of the results.
△ Less
Submitted 27 June, 2017; v1 submitted 17 December, 2015;
originally announced June 2016.
-
Direct-to-Reverberant Ratio Estimation on the ACE Corpus Using a Two-channel Beamformer
Authors:
James Eaton,
Patrick A. Naylor
Abstract:
Direct-to-Reverberant Ratio (DRR) is an important measure for characterizing the properties of a room. The recently proposed DRR Estimation using a Null-Steered Beamformer (DENBE) algorithm was originally tested on simulated data where noise was artificially added to the speech after convolution with impulse responses simulated using the image-source method. This paper evaluates the performance of…
▽ More
Direct-to-Reverberant Ratio (DRR) is an important measure for characterizing the properties of a room. The recently proposed DRR Estimation using a Null-Steered Beamformer (DENBE) algorithm was originally tested on simulated data where noise was artificially added to the speech after convolution with impulse responses simulated using the image-source method. This paper evaluates the performance of this algorithm on speech convolved with measured impulse responses and noise using the Acoustic Characterization of Environments (ACE) Evaluation corpus. The fullband DRR estimation performance of the DENBE algorithm exceeds that of the baselines in all Signal-to-Noise Ratios (SNRs) and noise types. In addition, estimation of the DRR in one third-octave ISO frequency bands is demonstrated.
△ Less
Submitted 26 October, 2015;
originally announced October 2015.
-
Evaluating the Non-Intrusive Room Acoustics Algorithm with the ACE Challenge
Authors:
Pablo Peso Parada,
Dushyant Sharma,
Toon van Waterschoot,
Patrick A. Naylor
Abstract:
We present a single channel data driven method for non-intrusive estimation of full-band reverberation time and full-band direct-to-reverberant ratio. The method extracts a number of features from reverberant speech and builds a model using a recurrent neural network to estimate the reverberant acoustic parameters. We explore three configurations by including different data and also by combining t…
▽ More
We present a single channel data driven method for non-intrusive estimation of full-band reverberation time and full-band direct-to-reverberant ratio. The method extracts a number of features from reverberant speech and builds a model using a recurrent neural network to estimate the reverberant acoustic parameters. We explore three configurations by including different data and also by combining the recurrent neural network estimates using a support vector machine. Our best method to estimate DRR provides a Root Mean Square Deviation (RMSD) of 3.84 dB and a RMSD of 43.19 % for T60 estimation.
△ Less
Submitted 15 October, 2015;
originally announced October 2015.
-
Reverberation time estimation on the ACE corpus using the SDD method
Authors:
James Eaton,
Patrick A. Naylor
Abstract:
Reverberation Time (T60) is an important measure for characterizing the properties of a room. The author's T60 estimation algorithm was previously tested on simulated data where the noise is artificially added to the speech after convolution with a impulse responses simulated using the image method. We test the algorithm on speech convolved with real recorded impulse responses and noise from the s…
▽ More
Reverberation Time (T60) is an important measure for characterizing the properties of a room. The author's T60 estimation algorithm was previously tested on simulated data where the noise is artificially added to the speech after convolution with a impulse responses simulated using the image method. We test the algorithm on speech convolved with real recorded impulse responses and noise from the same rooms from the Acoustic Characterization of Environments (ACE) corpus and achieve results comparable results to those using simulated data.
△ Less
Submitted 5 October, 2015;
originally announced October 2015.
-
Proceedings of the ACE Challenge Workshop - a satellite event of IEEE-WASPAA (2015)
Authors:
James Eaton,
Nikolay D. Gaubitch,
Alastair H. Moore,
Patrick A. Naylor
Abstract:
Several established parameters and metrics have been used to characterize the acoustics of a room. The most important are the Direct-To-Reverberant Ratio (DRR), the Reverberation Time (T60) and the reflection coefficient. The acoustic characteristics of a room based on such parameters can be used to predict the quality and intelligibility of speech signals in that room. Recently, several important…
▽ More
Several established parameters and metrics have been used to characterize the acoustics of a room. The most important are the Direct-To-Reverberant Ratio (DRR), the Reverberation Time (T60) and the reflection coefficient. The acoustic characteristics of a room based on such parameters can be used to predict the quality and intelligibility of speech signals in that room. Recently, several important methods in speech enhancement and speech recognition have been developed that show an increase in performance compared to the predecessors but do require knowledge of one or more fundamental acoustical parameters such as the T60. Traditionally, these parameters have been estimated using carefully measured Acoustic Impulse Responses (AIRs). However, in most applications it is not practical or even possible to measure the acoustic impulse response. Consequently, there is increasing research activity in the estimation of such parameters directly from speech and audio signals. The aim of this challenge was to evaluate state-of-the-art algorithms for blind acoustic parameter estimation from speech and to promote the emerging area of research in this field. Participants evaluated their algorithms for T60 and DRR estimation against the 'ground truth' values provided with the data-sets and presented the results in a paper describing the method used.
△ Less
Submitted 1 October, 2015;
originally announced October 2015.
-
Source Coding in Networks with Covariance Distortion Constraints
Authors:
Adel Zahedi,
Jan Østergaard,
Søren Holdt Jensen,
Patrick A. Naylor,
Søren Bech
Abstract:
We consider a source coding problem with a network scenario in mind, and formulate it as a remote vector Gaussian Wyner-Ziv problem under covariance matrix distortions. We define a notion of minimum for two positive-definite matrices based on which we derive an explicit formula for the rate-distortion function (RDF). We then study the special cases and applications of this result. We show that two…
▽ More
We consider a source coding problem with a network scenario in mind, and formulate it as a remote vector Gaussian Wyner-Ziv problem under covariance matrix distortions. We define a notion of minimum for two positive-definite matrices based on which we derive an explicit formula for the rate-distortion function (RDF). We then study the special cases and applications of this result. We show that two well-studied source coding problems, i.e. remote vector Gaussian Wyner-Ziv problems with mean-squared error and mutual information constraints are in fact special cases of our results. Finally, we apply our results to a joint source coding and denoising problem. We consider a network with a centralized topology and a given weighted sum-rate constraint, where the received signals at the center are to be fused to maximize the output SNR while enforcing no linear distortion. We show that one can design the distortion matrices at the nodes in order to maximize the output SNR at the fusion center. We thereby bridge between denoising and source coding within this setup.
△ Less
Submitted 27 September, 2016; v1 submitted 5 April, 2015;
originally announced April 2015.
-
Testing bi-orderability of knot groups
Authors:
Adam Clay,
Colin Desmarais,
Patrick Naylor
Abstract:
We investigate the bi-orderability of two-bridge knot groups and the groups of knots with 12 or fewer crossings by applying recent theorems of Chiswell, Glass and Wilson. Amongst all knots with 12 or fewer crossings (of which there are 2977), previous theorems were only able to determine bi-orderability of 599 of the corresponding knot groups. With our methods we are able to deal with 191 more.
We investigate the bi-orderability of two-bridge knot groups and the groups of knots with 12 or fewer crossings by applying recent theorems of Chiswell, Glass and Wilson. Amongst all knots with 12 or fewer crossings (of which there are 2977), previous theorems were only able to determine bi-orderability of 599 of the corresponding knot groups. With our methods we are able to deal with 191 more.
△ Less
Submitted 21 October, 2014;
originally announced October 2014.
-
Distributed Remote Vector Gaussian Source Coding with Covariance Distortion Constraints
Authors:
Adel Zahedi,
Jan Ostergaard,
Soren Holdt Jensen,
Patrick Naylor,
Soren Bech
Abstract:
In this paper, we consider a distributed remote source coding problem, where a sequence of observations of source vectors is available at the encoder. The problem is to specify the optimal rate for encoding the observations subject to a covariance matrix distortion constraint and in the presence of side information at the decoder. For this problem, we derive lower and upper bounds on the rate-dist…
▽ More
In this paper, we consider a distributed remote source coding problem, where a sequence of observations of source vectors is available at the encoder. The problem is to specify the optimal rate for encoding the observations subject to a covariance matrix distortion constraint and in the presence of side information at the decoder. For this problem, we derive lower and upper bounds on the rate-distortion function (RDF) for the Gaussian case, which in general do not coincide. We then provide some cases, where the RDF can be derived exactly. We also show that previous results on specific instances of this problem can be generalized using our results. We finally show that if the distortion measure is the mean squared error, or if it is replaced by a certain mutual information constraint, the optimal rate can be derived from our main result.
△ Less
Submitted 4 June, 2014; v1 submitted 23 January, 2014;
originally announced January 2014.
-
Distributed Remote Vector Gaussian Source Coding for Wireless Acoustic Sensor Networks
Authors:
Adel Zahedi,
Jan Ostergaard,
Soren Holdt Jensen,
Patrick Naylor,
Soren Bech
Abstract:
In this paper, we consider the problem of remote vector Gaussian source coding for a wireless acoustic sensor network. Each node receives messages from multiple nodes in the network and decodes these messages using its own measurement of the sound field as side information. The node's measurement and the estimates of the source resulting from decoding the received messages are then jointly encoded…
▽ More
In this paper, we consider the problem of remote vector Gaussian source coding for a wireless acoustic sensor network. Each node receives messages from multiple nodes in the network and decodes these messages using its own measurement of the sound field as side information. The node's measurement and the estimates of the source resulting from decoding the received messages are then jointly encoded and transmitted to a neighboring node in the network. We show that for this distributed source coding scenario, one can encode a so-called conditional sufficient statistic of the sources instead of jointly encoding multiple sources. We focus on the case where node measurements are in form of noisy linearly mixed combinations of the sources and the acoustic channel mixing matrices are invertible. For this problem, we derive the rate-distortion function for vector Gaussian sources and under covariance distortion constraints.
△ Less
Submitted 16 January, 2014;
originally announced January 2014.