Skip to main content

Showing 1–12 of 12 results for author: Kelz, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2211.15524  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Differentiable Dictionary Search: Integrating Linear Mixing with Deep Non-Linear Modelling for Audio Source Separation

    Authors: Lukáš Samuel Marták, Rainer Kelz, Gerhard Widmer

    Abstract: This paper describes several improvements to a new method for signal decomposition that we recently formulated under the name of Differentiable Dictionary Search (DDS). The fundamental idea of DDS is to exploit a class of powerful deep invertible density estimators called normalizing flows, to model the dictionary in a linear decomposition method such as NMF, effectively creating a bijection betwe… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: Published in the Proceedings of the 24th International Congress on Acoustics (ICA 2022), Gyeongju, Korea, October 24-28, 2022

  2. Probabilistic Modelling of Signal Mixtures with Differentiable Dictionaries

    Authors: Lukáš Samuel Marták, Rainer Kelz, Gerhard Widmer

    Abstract: We introduce a novel way to incorporate prior information into (semi-) supervised non-negative matrix factorization, which we call differentiable dictionary search. It enables general, highly flexible and principled modelling of mixtures where non-linear sources are linearly mixed. We study its behavior on an audio decomposition task, and conduct an extensive, highly controlled study of its modell… ▽ More

    Submitted 28 November, 2022; originally announced November 2022.

    Comments: Published in the Proceedings of the 29th European Signal Processing Conference (EUSIPCO 2021), Dublin, Ireland, August 23-27, 2021 (IEEE), 441-445

  3. arXiv:2007.10736  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Learning to Read and Follow Music in Complete Score Sheet Images

    Authors: Florian Henkel, Rainer Kelz, Gerhard Widmer

    Abstract: This paper addresses the task of score following in sheet music given as unprocessed images. While existing work either relies on OMR software to obtain a computer-readable score representation, or crucially relies on prepared sheet image excerpts, we propose the first system that directly performs score following in full-page, completely unprocessed sheet images. Based on incoming audio and a giv… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

    Comments: Published in the Proceedings of the 21th International Society for Music Information Retrieval Conference, Montréal, Canada 2020

  4. arXiv:1910.07254  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Audio-Conditioned U-Net for Position Estimation in Full Sheet Images

    Authors: Florian Henkel, Rainer Kelz, Gerhard Widmer

    Abstract: The goal of score following is to track a musical performance, usually in the form of audio, in a corresponding score representation. Established methods mainly rely on computer-readable scores in the form of MIDI or MusicXML and achieve robust and reliable tracking results. Recently, multimodal deep learning methods have been used to follow along musical performances in raw sheet images. Among th… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

    Comments: Accepted at International Workshop on Reading Music Systems 2019 (WoRMS)

  5. arXiv:1909.01622  [pdf, other

    cs.SD eess.AS

    Towards Interpretable Polyphonic Transcription with Invertible Neural Networks

    Authors: Rainer Kelz, Gerhard Widmer

    Abstract: We explore a novel way of conceptualising the task of polyphonic music transcription, using so-called invertible neural networks. Invertible models unify both discriminative and generative aspects in one function, sharing one set of parameters. Introducing invertibility enables the practitioner to directly inspect what the discriminative model has learned, and exactly determine which inputs lead t… ▽ More

    Submitted 4 September, 2019; originally announced September 2019.

    Comments: Published at the 20th International Society for Music Information Retrieval Conference, Delft, The Netherlands, 2019

  6. Deep Polyphonic ADSR Piano Note Transcription

    Authors: Rainer Kelz, Sebastian Böck, Gerhard Widmer

    Abstract: We investigate a late-fusion approach to piano transcription, combined with a strong temporal prior in the form of a handcrafted Hidden Markov Model (HMM). The network architecture under consideration is compact in terms of its number of parameters and easy to train with gradient descent. The network outputs are fused over time in the final stage to obtain note segmentations, with an HMM whose tra… ▽ More

    Submitted 21 June, 2019; originally announced June 2019.

    Comments: 5 pages, 2 figures, published as ICASSP'19

    Journal ref: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 246-250

  7. arXiv:1902.04390  [pdf, other

    cs.SD eess.AS

    Multitask Learning for Polyphonic Piano Transcription, a Case Study

    Authors: Rainer Kelz, Sebastian Böck, Gerhard Widmer

    Abstract: Viewing polyphonic piano transcription as a multitask learning problem, where we need to simultaneously predict onsets, intermediate frames and offsets of notes, we investigate the performance impact of additional prediction targets, using a variety of suitable convolutional neural network architectures. We quantify performance differences of additional objectives on the large MAESTRO dataset.

    Submitted 12 February, 2019; originally announced February 2019.

  8. arXiv:1805.11526  [pdf, other

    cs.SD cs.LG eess.AS

    Learning to Transcribe by Ear

    Authors: Rainer Kelz, Gerhard Widmer

    Abstract: Rethinking how to model polyphonic transcription formally, we frame it as a reinforcement learning task. Such a task formulation encompasses the notion of a musical agent and an environment containing an instrument as well as the sound source to be transcribed. Within this conceptual framework, the transcription process can be described as the agent interacting with the instrument in the environme… ▽ More

    Submitted 29 May, 2018; originally announced May 2018.

  9. arXiv:1805.10880  [pdf, other

    cs.SD cs.LG eess.AS

    Investigating Label Noise Sensitivity of Convolutional Neural Networks for Fine Grained Audio Signal Labelling

    Authors: Rainer Kelz, Gerhard Widmer

    Abstract: We measure the effect of small amounts of systematic and random label noise caused by slightly misaligned ground truth labels in a fine grained audio signal labeling task. The task we choose to demonstrate these effects on is also known as framewise polyphonic transcription or note quantized multi-f0 estimation, and transforms a monaural audio signal into a sequence of note indicator labels. It wi… ▽ More

    Submitted 28 May, 2018; originally announced May 2018.

    Comments: accepted at ICASSP 2018

  10. arXiv:1702.00025  [pdf, other

    cs.SD

    An Experimental Analysis of the Entanglement Problem in Neural-Network-based Music Transcription Systems

    Authors: Rainer Kelz, Gerhard Widmer

    Abstract: Several recent polyphonic music transcription systems have utilized deep neural networks to achieve state of the art results on various benchmark datasets, pushing the envelope on framewise and note-level performance measures. Unfortunately we can observe a sort of glass ceiling effect. To investigate this effect, we provide a detailed analysis of the particular kinds of errors that state of the a… ▽ More

    Submitted 31 January, 2017; originally announced February 2017.

    Comments: Submitted to AES Conference on Semantic Audio, Erlangen, Germany, 2017 June 22, 24

  11. arXiv:1612.05153  [pdf, other

    cs.SD cs.LG

    On the Potential of Simple Framewise Approaches to Piano Transcription

    Authors: Rainer Kelz, Matthias Dorfer, Filip Korzeniowski, Sebastian Böck, Andreas Arzt, Gerhard Widmer

    Abstract: In an attempt at exploring the limitations of simple approaches to the task of piano transcription (as usually defined in MIR), we conduct an in-depth analysis of neural network-based framewise transcription. We systematically compare different popular input representations for transcription systems to determine the ones most suitable for use with neural networks. Exploiting recent advances in tra… ▽ More

    Submitted 15 December, 2016; originally announced December 2016.

    Comments: Proceedings of the 17th International Society for Music Information Retrieval Conference (ISMIR 2016), New York, NY

  12. arXiv:1511.04707  [pdf, other

    cs.LG

    Deep Linear Discriminant Analysis

    Authors: Matthias Dorfer, Rainer Kelz, Gerhard Widmer

    Abstract: We introduce Deep Linear Discriminant Analysis (DeepLDA) which learns linearly separable latent representations in an end-to-end fashion. Classic LDA extracts features which preserve class separability and is used for dimensionality reduction for many classification problems. The central idea of this paper is to put LDA on top of a deep neural network. This can be seen as a non-linear extension of… ▽ More

    Submitted 17 February, 2016; v1 submitted 15 November, 2015; originally announced November 2015.

    Comments: Published as a conference paper at ICLR 2016