-
AEROMamba: An efficient architecture for audio super-resolution using generative adversarial networks and state space models
Authors:
Wallace Abreu,
Luiz Wagner Pereira Biscainho
Abstract:
Audio super-resolution aims to enhance low-resolution signals by creating high-frequency content. In this work, we modify the architecture of AERO (a state-of-the-art system for this task) for music super-resolution. SPecifically, we replace its original Attention and LSTM layers with Mamba, a State Space Model (SSM), across all network layers. Mamba is capable of effectively substituting the ment…
▽ More
Audio super-resolution aims to enhance low-resolution signals by creating high-frequency content. In this work, we modify the architecture of AERO (a state-of-the-art system for this task) for music super-resolution. SPecifically, we replace its original Attention and LSTM layers with Mamba, a State Space Model (SSM), across all network layers. Mamba is capable of effectively substituting the mentioned modules, as it offers a mechanism similar to that of Attention while also functioning as a recurrent network. With the proposed AEROMamba, training requires 2-4x less GPU memory, since Mamba exploits the convolutional formulation and leverages GPU memory hierarchy. Additionally, during inference, Mamba operates in constant memory due to recurrence, avoiding memory growth associated with Attention. This results in a 14x speed improvement using 5x less GPU. Subjective listening tests (0 to 100 scale) show that the proposed model surpasses the AERO model. In the MUSDB dataset, degraded signals scored 38.22, while AERO and AEROMamba scored 60.03 and 66.74, respectively. For the PianoEval dataset, scores were 72.92 for degraded signals, 76.89 for AERO, and 84.41 for AEROMamba.
△ Less
Submitted 11 November, 2024;
originally announced November 2024.
-
Adapting Meter Tracking Models to Latin American Music
Authors:
Lucas S. Maia,
Martín Rocamora,
Luiz W. P. Biscainho,
Magdalena Fuentes
Abstract:
Beat and downbeat tracking models have improved significantly in recent years with the introduction of deep learning methods. However, despite these improvements, several challenges remain. Particularly, the adaptation of available models to underrepresented music traditions in MIR is usually synonymous with collecting and annotating large amounts of data, which is impractical and time-consuming.…
▽ More
Beat and downbeat tracking models have improved significantly in recent years with the introduction of deep learning methods. However, despite these improvements, several challenges remain. Particularly, the adaptation of available models to underrepresented music traditions in MIR is usually synonymous with collecting and annotating large amounts of data, which is impractical and time-consuming. Transfer learning, data augmentation, and fine-tuning techniques have been used quite successfully in related tasks and are known to alleviate this bottleneck. Furthermore, when studying these music traditions, models are not required to generalize to multiple mainstream music genres but to perform well in more constrained, homogeneous conditions. In this work, we investigate simple yet effective strategies to adapt beat and downbeat tracking models to two different Latin American music traditions and analyze the feasibility of these adaptations in real-world applications concerning the data and computational requirements. Contrary to common belief, our findings show it is possible to achieve good performance by spending just a few minutes annotating a portion of the data and training a model in a standard CPU machine, with the precise amount of resources needed depending on the task and the complexity of the dataset.
△ Less
Submitted 14 April, 2023;
originally announced April 2023.
-
Bayesian Restoration of Audio Degraded by Low-Frequency Pulses Modeled via Gaussian Process
Authors:
Hugo Tremonte de Carvalho,
Flávio Rainho Ávila,
Luiz Wagner Pereira Biscainho
Abstract:
A common defect found when reproducing old vinyl and gramophone recordings with mechanical devices are the long pulses with significant low-frequency content caused by the interaction of the arm-needle system with deep scratches or even breakages on the media surface. Previous approaches to their suppression on digital counterparts of the recordings depend on a prior estimation of the pulse locati…
▽ More
A common defect found when reproducing old vinyl and gramophone recordings with mechanical devices are the long pulses with significant low-frequency content caused by the interaction of the arm-needle system with deep scratches or even breakages on the media surface. Previous approaches to their suppression on digital counterparts of the recordings depend on a prior estimation of the pulse location, usually performed via heuristic methods. This paper proposes a novel Bayesian approach capable of jointly estimating the pulse location; interpolating the almost annihilated signal underlying the strong discontinuity that initiates the pulse; and also estimating the long pulse tail by a simple Gaussian Process, allowing its suppression from the corrupted signal. The posterior distribution for the model parameters as well for the pulse is explored via Markov-Chain Monte Carlo (MCMC) algorithms. Controlled experiments indicate that the proposed method, while requiring significantly less user intervention, achieves perceptual results similar to those of previous approaches and performs well when dealing with naturally degraded signals.
△ Less
Submitted 26 September, 2020; v1 submitted 28 May, 2020;
originally announced May 2020.
-
Mobile Sound Recognition for the Deaf and Hard of Hearing
Authors:
Leonardo A. Fanzeres,
Adriana S. Vivacqua,
Luiz W. P. Biscainho
Abstract:
Human perception of surrounding events is strongly dependent on audio cues. Thus, acoustic insulation can seriously impact situational awareness. We present an exploratory study in the domain of assistive computing, eliciting requirements and presenting solutions to problems found in the development of an environmental sound recognition system, which aims to assist deaf and hard of hearing people…
▽ More
Human perception of surrounding events is strongly dependent on audio cues. Thus, acoustic insulation can seriously impact situational awareness. We present an exploratory study in the domain of assistive computing, eliciting requirements and presenting solutions to problems found in the development of an environmental sound recognition system, which aims to assist deaf and hard of hearing people in the perception of sounds. To take advantage of smartphones computational ubiquity, we propose a system that executes all processing on the device itself, from audio features extraction to recognition and visual presentation of results. Our application also presents the confidence level of the classification to the user. A test of the system conducted with deaf users provided important and inspiring feedback from participants.
△ Less
Submitted 19 October, 2018;
originally announced October 2018.
-
Efficient Steered-Response Power Methods for Sound Source Localization Using Microphone Arrays
Authors:
Markus V. S. Lima,
Wallace A. Martins,
Leonardo O. Nunes,
Luiz W. P. Biscainho,
Tadeu N. Ferreira,
Maurício V. M. Costa,
Bowon Lee
Abstract:
This paper proposes an efficient method based on the steered-response power (SRP) technique for sound source localization using microphone arrays: the volumetric SRP (V-SRP). As compared to the SRP, by deploying a sparser volumetric grid, the V-SRP achieves a significant reduction of the computational complexity without sacrificing the accuracy of the location estimates. By appending a fine search…
▽ More
This paper proposes an efficient method based on the steered-response power (SRP) technique for sound source localization using microphone arrays: the volumetric SRP (V-SRP). As compared to the SRP, by deploying a sparser volumetric grid, the V-SRP achieves a significant reduction of the computational complexity without sacrificing the accuracy of the location estimates. By appending a fine search step to the V-SRP, its refined version (RV-SRP) improves on the compromise between complexity and accuracy. Experiments conducted in both simulated- and real-data scenarios demonstrate the benefits of the proposed approaches. Specifically, the RV-SRP is shown to outperform the SRP in accuracy at a computational cost of about ten times lower.
△ Less
Submitted 15 February, 2015; v1 submitted 9 July, 2014;
originally announced July 2014.