Search | arXiv e-print repository

Beyond Global Metrics: A Fairness Analysis for Interpretable Voice Disorder Detection Systems

Authors: Mariel Estevez, Cyntia Bonomi, Dayana Ribas, Alfonso Ortega, Luciana Ferrer

Abstract: We conducted a comprehensive analysis of an Automatic Voice Disorders Detection (AVDD) system using existing voice disorder datasets with available demographic metadata. The study involved analysing system performance across various demographic groups, particularly focusing on gender and age-based cohorts. Performance evaluation was based on multiple metrics, including normalised costs and cross-e… ▽ More We conducted a comprehensive analysis of an Automatic Voice Disorders Detection (AVDD) system using existing voice disorder datasets with available demographic metadata. The study involved analysing system performance across various demographic groups, particularly focusing on gender and age-based cohorts. Performance evaluation was based on multiple metrics, including normalised costs and cross-entropy. We employed calibration techniques trained separately on predefined demographic groups to address group-dependent miscalibration. Analysis revealed significant performance disparities across groups despite strong global metrics. The system showed systematic biases, misclassifying healthy speakers over 55 as having a voice disorder and speakers with disorders aged 14-30 as healthy. Group-specific calibration improved posterior probability quality, reducing overconfidence. For young disordered speakers, low severity scores were identified as contributing to poor system performance. For older speakers, age-related voice characteristics and potential limitations in the pretrained Hubert model used as feature extractor likely affected results. The study demonstrates that global performance metrics are insufficient for evaluating AVDD system performance. Group-specific analysis may unmask problems in system performance which are hidden within global metrics. Further, group-dependent calibration strategies help mitigate biases, resulting in a more reliable indication of system confidence. These findings emphasize the need for demographic-specific evaluation and calibration in voice disorder detection systems, while providing a methodological framework applicable to broader biomedical classification tasks where demographic metadata is available. △ Less

Submitted 11 April, 2025; originally announced April 2025.

Comments: 34 pages, 6 figures, 2 tables

arXiv:2012.14180 [pdf]

doi 10.12688/openreseurope.13135.2

A Google Earth Engine-enabled Python approach to improve identification of anthropogenic palaeo-landscape features

Authors: Filippo Brandolini, Guillem Domingo Ribas, Andrea Zerboni, Sam Turner

Abstract: The necessity of sustainable development for landscapes has emerged as an important theme in recent decades. Current methods take a holistic approach to landscape heritage and promote an interdisciplinary dialogue to facilitate complementary landscape management strategies. With the socio-economic values of the natural and cultural landscape heritage increasingly recognised worldwide, remote sensi… ▽ More The necessity of sustainable development for landscapes has emerged as an important theme in recent decades. Current methods take a holistic approach to landscape heritage and promote an interdisciplinary dialogue to facilitate complementary landscape management strategies. With the socio-economic values of the natural and cultural landscape heritage increasingly recognised worldwide, remote sensing tools are being used more and more to facilitate the recording and management of landscape heritage. Satellite remote sensing technologies have enabled significant improvements in landscape research. The advent of the cloud-based platform of Google Earth Engine has allowed the rapid exploration and processing of satellite imagery such as the Landsat and Copernicus Sentinel datasets. In this paper, the use of Sentinel-2 satellite data in the identification of palaeo-riverscape features has been assessed in the Po Plain, selected because it is characterized by human exploitation since the Mid-Holocene. A multi-temporal approach has been adopted to investigate the potential of satellite imagery to detect buried hydrological and anthropogenic features along with Spectral Index and Spectral Decomposition analysis. This research represents one of the first applications of the GEE Python API in landscape studies. The complete FOSS-cloud protocol proposed here consists of a Python code script developed in Google Colab which could be simply adapted and replicated in different areas of the world △ Less

Submitted 28 December, 2020; originally announced December 2020.

Comments: 33 pages, 10 figures, 2 tables

arXiv:1904.05167 [pdf, other]

Speech Enhancement with Wide Residual Networks in Reverberant Environments

Authors: Jorge Llombart, Dayana Ribas, Antonio Miguel, Luis Vicente, Alfonso Ortega, Eduardo Lleida

Abstract: This paper proposes a speech enhancement method which exploits the high potential of residual connections in a Wide Residual Network architecture. This is supported on single dimensional convolutions computed alongside the time domain, which is a powerful approach to process contextually correlated representations through the temporal domain, such as speech feature sequences. We find the residual… ▽ More This paper proposes a speech enhancement method which exploits the high potential of residual connections in a Wide Residual Network architecture. This is supported on single dimensional convolutions computed alongside the time domain, which is a powerful approach to process contextually correlated representations through the temporal domain, such as speech feature sequences. We find the residual mechanism extremely useful for the enhancement task since the signal always has a linear shortcut and the non-linear path enhances it in several steps by adding or subtracting corrections. The enhancement capability of the proposal is assessed by objective quality metrics evaluated with simulated and real samples of reverberated speech signals. Results show that the proposal outperforms the state-of-the-art method called WPE, which is known to effectively reduce reverberation and greatly enhance the signal. The proposed model, trained with artificial synthesized reverberation data, was able to generalize to real room impulse responses for a variety of conditions (e.g. different room sizes, $RT_{60}$, near & far field). Furthermore, it achieves accuracy for real speech with reverberation from two different datasets. △ Less

Submitted 9 April, 2019; originally announced April 2019.

Comments: 5 pages, 4 figures. arXiv admin note: text overlap with arXiv:1901.00660, arXiv:1904.04511

arXiv:1904.04511 [pdf, other]

Progressive Speech Enhancement with Residual Connections

Authors: Jorge Llombart, Dayana Ribas, Antonio Miguel, Luis Vicente, Alfonso Ortega, Eduardo Lleida

Abstract: This paper studies the Speech Enhancement based on Deep Neural Networks. The proposed architecture gradually follows the signal transformation during enhancement by means of a visualization probe at each network block. Alongside the process, the enhancement performance is visually inspected and evaluated in terms of regression cost. This progressive scheme is based on Residual Networks. During the… ▽ More This paper studies the Speech Enhancement based on Deep Neural Networks. The proposed architecture gradually follows the signal transformation during enhancement by means of a visualization probe at each network block. Alongside the process, the enhancement performance is visually inspected and evaluated in terms of regression cost. This progressive scheme is based on Residual Networks. During the process, we investigate a residual connection with a constant number of channels, including internal state between blocks, and adding progressive supervision. The insights provided by the interpretation of the network enhancement process leads us to design an improved architecture for the enhancement purpose. Following this strategy, we are able to obtain speech enhancement results beyond the state-of-the-art, achieving a favorable trade-off between dereverberation and the amount of spectral distortion. △ Less

Submitted 9 April, 2019; originally announced April 2019.

Comments: 5 pages, 5 figures

arXiv:1902.05761 [pdf, other]

An improved uncertainty propagation method for robust i-vector based speaker recognition

Authors: Dayana Ribas, Emmanuel Vincent

Abstract: The performance of automatic speaker recognition systems degrades when facing distorted speech data containing additive noise and/or reverberation. Statistical uncertainty propagation has been introduced as a promising paradigm to address this challenge. So far, different uncertainty propagation methods have been proposed to compensate noise and reverberation in i-vectors in the context of speaker… ▽ More The performance of automatic speaker recognition systems degrades when facing distorted speech data containing additive noise and/or reverberation. Statistical uncertainty propagation has been introduced as a promising paradigm to address this challenge. So far, different uncertainty propagation methods have been proposed to compensate noise and reverberation in i-vectors in the context of speaker recognition. They have achieved promising results on small datasets such as YOHO and Wall Street Journal, but little or no improvement on the larger, highly variable NIST Speaker Recognition Evaluation (SRE) corpus. In this paper, we propose a complete uncertainty propagation method, whereby we model the effect of uncertainty both in the computation of unbiased Baum-Welch statistics and in the derivation of the posterior expectation of the i-vector. We conduct experiments on the NIST-SRE corpus mixed with real domestic noise and reverberation from the CHiME-2 corpus and preprocessed by multichannel speech enhancement. The proposed method improves the equal error rate (EER) by 4% relative compared to a conventional i-vector based speaker verification baseline. This is to be compared with previous methods which degrade performance. △ Less

Submitted 19 February, 2019; v1 submitted 15 February, 2019; originally announced February 2019.

Journal ref: 44th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2019), May 2019, Brighton, United Kingdom

arXiv:1901.00660 [pdf, other]

Deep Speech Enhancement for Reverberated and Noisy Signals using Wide Residual Networks

Authors: Dayana Ribas, Jorge Llombart, Antonio Miguel, Luis Vicente

Abstract: This paper proposes a deep speech enhancement method which exploits the high potential of residual connections in a wide neural network architecture, a topology known as Wide Residual Network. This is supported on single dimensional convolutions computed alongside the time domain, which is a powerful approach to process contextually correlated representations through the temporal domain, such as s… ▽ More This paper proposes a deep speech enhancement method which exploits the high potential of residual connections in a wide neural network architecture, a topology known as Wide Residual Network. This is supported on single dimensional convolutions computed alongside the time domain, which is a powerful approach to process contextually correlated representations through the temporal domain, such as speech feature sequences. We find the residual mechanism extremely useful for the enhancement task since the signal always has a linear shortcut and the non-linear path enhances it in several steps by adding or subtracting corrections. The enhancement capacity of the proposal is assessed by objective quality metrics and the performance of a speech recognition system. This was evaluated in the framework of the REVERB Challenge dataset, including simulated and real samples of reverberated and noisy speech signals. Results showed that enhanced speech from the proposed method succeeded for both, the enhancement task with intelligibility purposes and the speech recognition system. The DNN model, trained with artificial synthesized reverberation data, was able to deal with far-field reverberated speech from real scenarios. Furthermore, the method was able to take advantage of the residual connection achieving to enhance signals with low noise level, which is usually a strong handicap of traditional enhancement methods. △ Less

Submitted 3 January, 2019; originally announced January 2019.

arXiv:1811.03494 [pdf]

Testing SPARUS II AUV, an open platform for industrial, scientific and academic applications

Authors: Marc Carreras, Carles Candela, David Ribas, Narcís Palomeras, Lluís Magií, Angelos Mallios, Eduard Vidal, Èric Pairet, Pere Ridao

Abstract: This paper describes the experience of preparing and testing the SPARUS II AUV in different applications. The AUV was designed as a lightweight vehicle combining the classical torpedo-shape features with the hovering capability. The robot has a payload area to allow the integration of different equipment depending on the application. The software architecture is based on ROS, an open framework tha… ▽ More This paper describes the experience of preparing and testing the SPARUS II AUV in different applications. The AUV was designed as a lightweight vehicle combining the classical torpedo-shape features with the hovering capability. The robot has a payload area to allow the integration of different equipment depending on the application. The software architecture is based on ROS, an open framework that allows an easy integration of many devices and systems. Its flexibility, easy operation and openness makes the SPARUS II AUV a multipurpose platform that can adapt to industrial, scientific and academic applications. Five units were developed in 2014, and different teams used and adapted the platform for different applications. The paper describes some of the experiences in preparing and testing this open platform to different applications. △ Less

Submitted 6 November, 2018; originally announced November 2018.

Showing 1–7 of 7 results for author: Ribas, D