Skip to main content

Showing 1–8 of 8 results for author: Stöter, F

Searching in archive eess. Search in all archives.
.
  1. arXiv:2408.06370  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Lyrics Transcription for Humans: A Readability-Aware Benchmark

    Authors: Ondřej Cífka, Hendrik Schreiber, Luke Miner, Fabian-Robert Stöter

    Abstract: Writing down lyrics for human consumption involves not only accurately capturing word sequences, but also incorporating punctuation and formatting for clarity and to convey contextual information. This includes song structure, emotional emphasis, and contrast between lead and background vocals. While automatic lyrics transcription (ALT) systems have advanced beyond producing unstructured strings o… ▽ More

    Submitted 30 July, 2024; originally announced August 2024.

    Comments: ISMIR 2024 camera-ready. 6 pages + references + supplementary material. Website https://audioshake.github.io/jam-alt/ Data https://huggingface.co/datasets/audioshake/jam-alt Code https://github.com/audioshake/alt-eval/. arXiv admin note: text overlap with arXiv:2311.13987

  2. arXiv:2311.13987  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Jam-ALT: A Formatting-Aware Lyrics Transcription Benchmark

    Authors: Ondřej Cífka, Constantinos Dimitriou, Cheng-i Wang, Hendrik Schreiber, Luke Miner, Fabian-Robert Stöter

    Abstract: Current automatic lyrics transcription (ALT) benchmarks focus exclusively on word content and ignore the finer nuances of written lyrics including formatting and punctuation, which leads to a potential misalignment with the creative products of musicians and songwriters as well as listeners' experiences. For example, line breaks are important in conveying information about rhythm, emotional emphas… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

    Comments: 6 pages (3 pages main content); website: https://audioshake.github.io/jam-alt/; data: https://huggingface.co/datasets/audioshake/jam-alt; code: https://github.com/audioshake/alt-eval/

  3. arXiv:2308.06979  [pdf, other

    eess.AS cs.SD

    The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track

    Authors: Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang , et al. (2 additional authors not shown)

    Abstract: This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce t… ▽ More

    Submitted 19 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

    Comments: Published in Transactions of the International Society for Music Information Retrieval (https://transactions.ismir.net/articles/10.5334/tismir.171)

    Journal ref: Transactions of the International Society for Music Information Retrieval, 7(1), pp.63-84, 2024

  4. Music Demixing Challenge 2021

    Authors: Yuki Mitsufuji, Giorgio Fabbro, Stefan Uhlich, Fabian-Robert Stöter, Alexandre Défossez, Minseok Kim, Woosung Choi, Chin-Yun Yu, Kin-Wai Cheuk

    Abstract: Music source separation has been intensively studied in the last decade and tremendous progress with the advent of deep learning could be observed. Evaluation campaigns such as MIREX or SiSEC connected state-of-the-art models and corresponding papers, which can help researchers integrate the best practices into their models. In recent years, the widely used MUSDB18 dataset played an important role… ▽ More

    Submitted 23 May, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

    Journal ref: Frontiers in Signal Processing, 28 January 2022

  5. arXiv:2005.04132  [pdf, other

    eess.AS cs.SD

    Asteroid: the PyTorch-based audio source separation toolkit for researchers

    Authors: Manuel Pariente, Samuele Cornell, Joris Cosentino, Sunit Sivasankaran, Efthymios Tzinis, Jens Heitkaemper, Michel Olvera, Fabian-Robert Stöter, Mathieu Hu, Juan M. Martín-Doñas, David Ditter, Ariel Frank, Antoine Deleforge, Emmanuel Vincent

    Abstract: This paper describes Asteroid, the PyTorch-based audio source separation toolkit for researchers. Inspired by the most successful neural source separation systems, it provides all neural building blocks required to build such a system. To improve reproducibility, Kaldi-style recipes on common audio source separation datasets are also provided. This paper describes the software architecture of Aste… ▽ More

    Submitted 8 May, 2020; originally announced May 2020.

    Comments: Submitted to Interspeech 2020

  6. arXiv:1804.08300  [pdf, other

    cs.SD eess.AS

    An Overview of Lead and Accompaniment Separation in Music

    Authors: Zafar Rafii, Antoine Liutkus, Fabian-Robert Stöter, Stylianos Ioannis Mimilakis, Derry FitzGerald, Bryan Pardo

    Abstract: Popular music is often composed of an accompaniment and a lead component, the latter typically consisting of vocals. Filtering such mixtures to extract one or both components has many applications, such as automatic karaoke and remixing. This particular case of source separation yields very specific challenges and opportunities, including the particular complexity of musical structures, but also r… ▽ More

    Submitted 23 April, 2018; originally announced April 2018.

  7. arXiv:1804.06267  [pdf, other

    eess.AS cs.SD

    The 2018 Signal Separation Evaluation Campaign

    Authors: Fabian-Robert Stöter, Antoine Liutkus, Nobutaka Ito

    Abstract: This paper reports the organization and results for the 2018 community-based Signal Separation Evaluation Campaign (SiSEC 2018). This year's edition was focused on audio and pursued the effort towards scaling up and making it easier to prototype audio separation software in an era of machine-learning based systems. For this purpose, we prepared a new music separation database: MUSDB18, featuring c… ▽ More

    Submitted 6 July, 2018; v1 submitted 17 April, 2018; originally announced April 2018.

    Comments: To appear in International Conference on Latent Variable Analysis and Signal Separation

  8. Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation

    Authors: Fabian-Robert Stöter, Soumitro Chakrabarty, Bernd Edler, Emanuël A. P. Habets

    Abstract: The task of estimating the maximum number of concurrent speakers from single channel mixtures is important for various audio-based applications, such as blind source separation, speaker diarisation, audio surveillance or auditory scene classification. Building upon powerful machine learning methodology, we develop a Deep Neural Network (DNN) that estimates a speaker count. While DNNs efficiently m… ▽ More

    Submitted 15 February, 2018; v1 submitted 12 December, 2017; originally announced December 2017.

    Comments: Accepted in ICASSP 2018