Skip to main content

Showing 1–11 of 11 results for author: Henkel, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2401.08902  [pdf, other

    cs.SD cs.DL cs.IR cs.LG eess.AS

    Similar but Faster: Manipulation of Tempo in Music Audio Embeddings for Tempo Prediction and Search

    Authors: Matthew C. McCallum, Florian Henkel, Jaehun Kim, Samuel E. Sandberg, Matthew E. P. Davies

    Abstract: Audio embeddings enable large scale comparisons of the similarity of audio files for applications such as search and recommendation. Due to the subjectivity of audio similarity, it can be desirable to design systems that answer not only whether audio is similar, but similar in what way (e.g., wrt. tempo, mood or genre). Previous works have proposed disentangled embedding spaces where subspaces rep… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted to the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

  2. arXiv:2401.08891  [pdf, other

    cs.SD cs.LG eess.AS

    Tempo estimation as fully self-supervised binary classification

    Authors: Florian Henkel, Jaehun Kim, Matthew C. McCallum, Samuel E. Sandberg, Matthew E. P. Davies

    Abstract: This paper addresses the problem of global tempo estimation in musical audio. Given that annotating tempo is time-consuming and requires certain musical expertise, few publicly available data sources exist to train machine learning models for this task. Towards alleviating this issue, we propose a fully self-supervised approach that does not rely on any human labeled data. Our method builds on the… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted to the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

  3. arXiv:2401.08889  [pdf, other

    cs.SD cs.IR cs.LG cs.MM eess.AS

    On the Effect of Data-Augmentation on Local Embedding Properties in the Contrastive Learning of Music Audio Representations

    Authors: Matthew C. McCallum, Matthew E. P. Davies, Florian Henkel, Jaehun Kim, Samuel E. Sandberg

    Abstract: Audio embeddings are crucial tools in understanding large catalogs of music. Typically embeddings are evaluated on the basis of the performance they provide in a wide range of downstream tasks, however few studies have investigated the local properties of the embedding spaces themselves which are important in nearest neighbor algorithms, commonly used in music search and recommendation. In this wo… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: Accepted to the International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

  4. arXiv:2304.12939  [pdf, other

    cs.SD cs.HC eess.AS

    The ACCompanion: Combining Reactivity, Robustness, and Musical Expressivity in an Automatic Piano Accompanist

    Authors: Carlos Cancino-Chacón, Silvan Peter, Patricia Hu, Emmanouil Karystinaios, Florian Henkel, Francesco Foscarin, Nimrod Varga, Gerhard Widmer

    Abstract: This paper introduces the ACCompanion, an expressive accompaniment system. Similarly to a musician who accompanies a soloist playing a given musical piece, our system can produce a human-like rendition of the accompaniment part that follows the soloist's choices in terms of tempo, dynamics, and articulation. The ACCompanion works in the symbolic domain, i.e., it needs a musical instrument capable… ▽ More

    Submitted 30 May, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: In Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI-23), Macao, China. The differences/extensions with the previous version include a technical appendix, added missing links, and minor text updates. 10 pages, 4 figures

  5. arXiv:2111.06643  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    Fully Automatic Page Turning on Real Scores

    Authors: Florian Henkel, Stephanie Schwaiger, Gerhard Widmer

    Abstract: We present a prototype of an automatic page turning system that works directly on real scores, i.e., sheet images, without any symbolic representation. Our system is based on a multi-modal neural network architecture that observes a complete sheet image page as input, listens to an incoming musical performance, and predicts the corresponding position in the image. Using the position estimation of… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: ISMIR 2021 Late Breaking/Demo

  6. arXiv:2107.08933  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Over-Parameterization and Generalization in Audio Classification

    Authors: Khaled Koutini, Hamid Eghbal-zadeh, Florian Henkel, Jan Schlüter, Gerhard Widmer

    Abstract: Convolutional Neural Networks (CNNs) have been dominating classification tasks in various domains, such as machine vision, machine listening, and natural language processing. In machine listening, while generally exhibiting very good generalization capabilities, CNNs are sensitive to the specific audio recording device used, which has been recognized as a substantial problem in the acoustic scene… ▽ More

    Submitted 19 July, 2021; originally announced July 2021.

    Comments: Presented at the ICML 2021 Workshop on Overparameterization: Pitfalls & Opportunities

  7. arXiv:2105.04309  [pdf, other

    cs.SD cs.LG eess.AS

    Multi-modal Conditional Bounding Box Regression for Music Score Following

    Authors: Florian Henkel, Gerhard Widmer

    Abstract: This paper addresses the problem of sheet-image-based on-line audio-to-score alignment also known as score following. Drawing inspiration from object detection, a conditional neural network architecture is proposed that directly predicts x,y coordinates of the matching positions in a complete score sheet image at each point in time for a given musical performance. Experiments are conducted on a sy… ▽ More

    Submitted 10 May, 2021; originally announced May 2021.

    Comments: Accepted for publication in the Proceedings of the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021

  8. arXiv:2011.02955  [pdf, other

    cs.LG cs.NE cs.SD

    Low-Complexity Models for Acoustic Scene Classification Based on Receptive Field Regularization and Frequency Damping

    Authors: Khaled Koutini, Florian Henkel, Hamid Eghbal-zadeh, Gerhard Widmer

    Abstract: Deep Neural Networks are known to be very demanding in terms of computing and memory requirements. Due to the ever increasing use of embedded systems and mobile devices with a limited resource budget, designing low-complexity models without sacrificing too much of their predictive performance gained great importance. In this work, we investigate and compare several well-known methods to reduce the… ▽ More

    Submitted 5 November, 2020; originally announced November 2020.

    Comments: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2020 Workshop (DCASE2020)

  9. arXiv:2007.10736  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Learning to Read and Follow Music in Complete Score Sheet Images

    Authors: Florian Henkel, Rainer Kelz, Gerhard Widmer

    Abstract: This paper addresses the task of score following in sheet music given as unprocessed images. While existing work either relies on OMR software to obtain a computer-readable score representation, or crucially relies on prepared sheet image excerpts, we propose the first system that directly performs score following in full-page, completely unprocessed sheet images. Based on incoming audio and a giv… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

    Comments: Published in the Proceedings of the 21th International Society for Music Information Retrieval Conference, Montréal, Canada 2020

  10. arXiv:1910.07254  [pdf, other

    cs.LG cs.SD eess.AS stat.ML

    Audio-Conditioned U-Net for Position Estimation in Full Sheet Images

    Authors: Florian Henkel, Rainer Kelz, Gerhard Widmer

    Abstract: The goal of score following is to track a musical performance, usually in the form of audio, in a corresponding score representation. Established methods mainly rely on computer-readable scores in the form of MIDI or MusicXML and achieve robust and reliable tracking results. Recently, multimodal deep learning methods have been used to follow along musical performances in raw sheet images. Among th… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

    Comments: Accepted at International Workshop on Reading Music Systems 2019 (WoRMS)

  11. arXiv:1807.06391  [pdf, other

    cs.AI cs.LG cs.SD eess.AS

    Learning to Listen, Read, and Follow: Score Following as a Reinforcement Learning Game

    Authors: Matthias Dorfer, Florian Henkel, Gerhard Widmer

    Abstract: Score following is the process of tracking a musical performance (audio) with respect to a known symbolic representation (a score). We start this paper by formulating score following as a multimodal Markov Decision Process, the mathematical foundation for sequential decision making. Given this formal definition, we address the score following task with state-of-the-art deep reinforcement learning… ▽ More

    Submitted 17 July, 2018; originally announced July 2018.

    Comments: Published in the Proceedings of the 19th International Society for Music Information Retrieval Conference, Paris, France, 2018