-
Improving snore detection under limited dataset through harmonic/percussive source separation and convolutional neural networks
Authors:
F. D. Gonzalez-Martinez,
J. J. Carabias-Orti,
F. J. Canadas-Quesada,
N. Ruiz-Reyes,
D. Martinez-Munoz,
S. Garcia-Galan
Abstract:
Snoring, an acoustic biomarker commonly observed in individuals with Obstructive Sleep Apnoea Syndrome (OSAS), holds significant potential for diagnosing and monitoring this recognized clinical disorder. Irrespective of snoring types, most snoring instances exhibit identifiable harmonic patterns manifested through distinctive energy distributions over time. In this work, we propose a novel method…
▽ More
Snoring, an acoustic biomarker commonly observed in individuals with Obstructive Sleep Apnoea Syndrome (OSAS), holds significant potential for diagnosing and monitoring this recognized clinical disorder. Irrespective of snoring types, most snoring instances exhibit identifiable harmonic patterns manifested through distinctive energy distributions over time. In this work, we propose a novel method to differentiate monaural snoring from non-snoring sounds by analyzing the harmonic content of the input sound using harmonic/percussive sound source separation (HPSS). The resulting feature, based on the harmonic spectrogram from HPSS, is employed as input data for conventional neural network architectures, aiming to enhance snoring detection performance even under a limited data learning framework. To evaluate the performance of our proposal, we studied two different scenarios: 1) using a large dataset of snoring and interfering sounds, and 2) using a reduced training set composed of around 1% of the data material. In the former scenario, the proposed HPSS-based feature provides competitive results compared to other input features from the literature. However, the key advantage of the proposed method lies in the superior performance of the harmonic spectrogram derived from HPSS in a limited data learning context. In this particular scenario, using the proposed harmonic feature significantly enhances the performance of all the studied architectures in comparison to the classical input features documented in the existing literature. This finding clearly demonstrates that incorporating harmonic content enables more reliable learning of the essential time-frequency characteristics that are prevalent in most snoring sounds, even in scenarios where the amount of training data is limited.
△ Less
Submitted 31 October, 2024;
originally announced October 2024.
-
SynthSOD: Developing an Heterogeneous Dataset for Orchestra Music Source Separation
Authors:
Jaime Garcia-Martinez,
David Diaz-Guerra,
Archontis Politis,
Tuomas Virtanen,
Julio J. Carabias-Orti,
Pedro Vera-Candeas
Abstract:
Recent advancements in music source separation have significantly progressed, particularly in isolating vocals, drums, and bass elements from mixed tracks. These developments owe much to the creation and use of large-scale, multitrack datasets dedicated to these specific components. However, the challenge of extracting similarly sounding sources from orchestra recordings has not been extensively e…
▽ More
Recent advancements in music source separation have significantly progressed, particularly in isolating vocals, drums, and bass elements from mixed tracks. These developments owe much to the creation and use of large-scale, multitrack datasets dedicated to these specific components. However, the challenge of extracting similarly sounding sources from orchestra recordings has not been extensively explored, largely due to a scarcity of comprehensive and clean (i.e bleed-free) multitrack datasets. In this paper, we introduce a novel multitrack dataset called SynthSOD, developed using a set of simulation techniques to create a realistic (i.e. using high-quality soundfonts), musically motivated, and heterogeneous training set comprising different dynamics, natural tempo changes, styles, and conditions. Moreover, we demonstrate the application of a widely used baseline music separation model trained on our synthesized dataset w.r.t to the well-known EnsembleSet, and evaluate its performance under both synthetic and real-world conditions.
△ Less
Submitted 17 February, 2025; v1 submitted 17 September, 2024;
originally announced September 2024.
-
Multichannel Singing Voice Separation by Deep Neural Network Informed DOA Constrained CNMF
Authors:
Antonio J. Muñoz-Montoro,
Julio J. Carabias-Orti,
Archontis Politis,
Konstantinos Drossos
Abstract:
This work addresses the problem of multichannel source separation combining two powerful approaches, multichannel spectral factorization with recent monophonic deep-learning (DL) based spectrum inference. Individual source spectra at different channels are estimated with a Masker-Denoiser Twin Network (MaD TwinNet), able to model long-term temporal patterns of a musical piece. The monophonic sourc…
▽ More
This work addresses the problem of multichannel source separation combining two powerful approaches, multichannel spectral factorization with recent monophonic deep-learning (DL) based spectrum inference. Individual source spectra at different channels are estimated with a Masker-Denoiser Twin Network (MaD TwinNet), able to model long-term temporal patterns of a musical piece. The monophonic source spectrograms are used within a spatial covariance mixing model based on Complex Non-Negative Matrix Factorization (CNMF) that predicts the spatial characteristics of each source. The proposed framework is evaluated on the task of singing voice separation with a large multichannel dataset. Experimental results show that our joint DL+CNMF method outperforms both the individual monophonic DL-based separation and the multichannel CNMF baseline methods.
△ Less
Submitted 2 March, 2020;
originally announced March 2020.