-
Soundbay: Deep Learning Framework for Marine Mammals and Bioacoustic Research
Authors:
Noam Bressler,
Michael Faran,
Amit Galor,
Michael Moshe Michelashvili,
Tomer Nachshon,
Noa Weiss
Abstract:
This paper presents Soundbay, an open-source Python framework that allows bio-acoustics and machine learning researchers to implement and utilize deep learning-based algorithms for acoustic audio analysis. Soundbay provides an easy and intuitive platform for applying existing models on one's data or creating new models effortlessly. One of the main advantages of the framework is the capability to…
▽ More
This paper presents Soundbay, an open-source Python framework that allows bio-acoustics and machine learning researchers to implement and utilize deep learning-based algorithms for acoustic audio analysis. Soundbay provides an easy and intuitive platform for applying existing models on one's data or creating new models effortlessly. One of the main advantages of the framework is the capability to compare baselines on different benchmarks, a crucial part of emerging research and development related to the usage of deep-learning algorithms for animal call analysis. We demonstrate this by providing a benchmark for cetacean call detection on multiple datasets. The framework is publicly accessible via https://github.com/deep-voice/soundbay
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Hierarchical Timbre-Painting and Articulation Generation
Authors:
Michael Michelashvili,
Lior Wolf
Abstract:
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness, such that the synthesized audio mimics the timbre and articulation of a target instrument. The generation process consists of learned source-filtering networks, which reconstruct the signal at increasing resolutions. The model optimizes a multi-resolution spectral loss as the reconstruction loss, a…
▽ More
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness, such that the synthesized audio mimics the timbre and articulation of a target instrument. The generation process consists of learned source-filtering networks, which reconstruct the signal at increasing resolutions. The model optimizes a multi-resolution spectral loss as the reconstruction loss, an adversarial loss to make the audio sound more realistic, and a perceptual f0 loss to align the output to the desired input pitch contour. The proposed architecture enables high-quality fitting of an instrument, given a sample that can be as short as a few minutes, and the method demonstrates state-of-the-art timbre transfer capabilities. Code and audio samples are shared at https://github.com/mosheman5/timbre_painting.
△ Less
Submitted 7 September, 2020; v1 submitted 30 August, 2020;
originally announced August 2020.
-
Speech Denoising by Accumulating Per-Frequency Modeling Fluctuations
Authors:
Michael Michelashvili,
Lior Wolf
Abstract:
We present a method for audio denoising that combines processing done in both the time domain and the time-frequency domain. Given a noisy audio clip, the method trains a deep neural network to fit this signal. Since the fitting is only partly successful and is able to better capture the underlying clean signal than the noise, the output of the network helps to disentangle the clean audio from the…
▽ More
We present a method for audio denoising that combines processing done in both the time domain and the time-frequency domain. Given a noisy audio clip, the method trains a deep neural network to fit this signal. Since the fitting is only partly successful and is able to better capture the underlying clean signal than the noise, the output of the network helps to disentangle the clean audio from the rest of the signal. This is done by accumulating a fitting score per time-frequency bin and applying the time-frequency domain filtering based on the obtained scores. The method is completely unsupervised and only trains on the specific audio clip that is being denoised. Our experiments demonstrate favorable performance in comparison to the literature methods. Our code and samples are available at github.com/mosheman5/DNP and as supplementary. Index Terms: Audio denoising; Unsupervised learning
△ Less
Submitted 8 June, 2020; v1 submitted 16 April, 2019;
originally announced April 2019.
-
Semi-Supervised Monaural Singing Voice Separation With a Masking Network Trained on Synthetic Mixtures
Authors:
Michael Michelashvili,
Sagie Benaim,
Lior Wolf
Abstract:
We study the problem of semi-supervised singing voice separation, in which the training data contains a set of samples of mixed music (singing and instrumental) and an unmatched set of instrumental music. Our solution employs a single mapping function g, which, applied to a mixed sample, recovers the underlying instrumental music, and, applied to an instrumental sample, returns the same sample. Th…
▽ More
We study the problem of semi-supervised singing voice separation, in which the training data contains a set of samples of mixed music (singing and instrumental) and an unmatched set of instrumental music. Our solution employs a single mapping function g, which, applied to a mixed sample, recovers the underlying instrumental music, and, applied to an instrumental sample, returns the same sample. The network g is trained using purely instrumental samples, as well as on synthetic mixed samples that are created by mixing reconstructed singing voices with random instrumental samples. Our results indicate that we are on a par with or better than fully supervised methods, which are also provided with training samples of unmixed singing voices, and are better than other recent semi-supervised methods.
△ Less
Submitted 6 May, 2019; v1 submitted 14 December, 2018;
originally announced December 2018.