Skip to main content

Showing 1–3 of 3 results for author: Bargum, A R

.
  1. arXiv:2408.16546  [pdf, other

    cs.SD eess.AS

    RAVE for Speech: Efficient Voice Conversion at High Sampling Rates

    Authors: Anders R. Bargum, Simon Lajboschitz, Cumhur Erkut

    Abstract: Voice conversion has gained increasing popularity within the field of audio manipulation and speech synthesis. Often, the main objective is to transfer the input identity to that of a target speaker without changing its linguistic content. While current work provides high-fidelity solutions they rarely focus on model simplicity, high-sampling rate environments or stream-ability. By incorporating s… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted for publication in Proceedings of the 27th International Conference on Digital Audio Effects (DAFx24), Guildford, United Kingdom, 3 - 7 September 2024

  2. arXiv:2311.08104  [pdf, other

    cs.SD cs.AI eess.AS

    Reimagining Speech: A Scoping Review of Deep Learning-Powered Voice Conversion

    Authors: Anders R. Bargum, Stefania Serafin, Cumhur Erkut

    Abstract: Research on deep learning-powered voice conversion (VC) in speech-to-speech scenarios is getting increasingly popular. Although many of the works in the field of voice conversion share a common global pipeline, there is a considerable diversity in the underlying structures, methods, and neural sub-blocks used across research efforts. Thus, obtaining a comprehensive understanding of the reasons beh… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  3. arXiv:2306.00860  [pdf, other

    cs.SD eess.AS

    Differentiable Allpass Filters for Phase Response Estimation and Automatic Signal Alignment

    Authors: Anders R. Bargum, Stefania Serafin, Cumhur Erkut, Julian D. Parker

    Abstract: Virtual analog (VA) audio effects are increasingly based on neural networks and deep learning frameworks. Due to the underlying black-box methodology, a successful model will learn to approximate the data it is presented, including potential errors such as latency and audio dropouts as well as non-linear characteristics and frequency-dependent phase shifts produced by the hardware. The latter is o… ▽ More

    Submitted 2 June, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

    Comments: Collaboration done while interning/employed at Native Instruments. Accepted for publication in Proc. DAFX'23, Copenhagen, Denmark, September 2023. Sound examples at https://abargum.github.io v2: 10 pages, LaTeX; figures resized, pdf optimized