Skip to main content

Showing 1–50 of 56 results for author: Fazekas, G

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.07199  [pdf, ps, other

    cs.SD cs.LG eess.AS eess.SP

    Audio synthesizer inversion in symmetric parameter spaces with approximately equivariant flow matching

    Authors: Ben Hayes, Charalampos Saitis, György Fazekas

    Abstract: Many audio synthesizers can produce the same signal given different parameter configurations, meaning the inversion from sound to parameters is an inherently ill-posed problem. We show that this is largely due to intrinsic symmetries of the synthesizer, and focus in particular on permutation invariance. First, we demonstrate on a synthetic task that regressing point estimates under permutation sym… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

    Comments: Accepted at ISMIR 2025

  2. arXiv:2505.11315  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior

    Authors: Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Wei-Hsiang Liao, Yuki Mitsufuji, György Fazekas

    Abstract: Style Transfer with Inference-Time Optimisation (ST-ITO) is a recent approach for transferring the applied effects of a reference audio to a raw audio track. It optimises the effect parameters to minimise the distance between the style embeddings of the processed audio and the reference. However, this method treats all possible configurations equally and relies solely on the embedding space, which… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: Submitted to WASPAA 2025

  3. arXiv:2505.03314  [pdf, other

    cs.SD cs.AI eess.AS

    Mamba-Diffusion Model with Learnable Wavelet for Controllable Symbolic Music Generation

    Authors: Jincheng Zhang, György Fazekas, Charalampos Saitis

    Abstract: The recent surge in the popularity of diffusion models for image synthesis has attracted new attention to their potential for generation tasks in other domains. However, their applications to symbolic music generation remain largely under-explored because symbolic music is typically represented as sequences of discrete events and standard diffusion models are not well-suited for discrete data. We… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  4. arXiv:2504.14735  [pdf, other

    cs.SD eess.AS

    DiffVox: A Differentiable Model for Capturing and Analysing Professional Effects Distributions

    Authors: Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Ben Hayes, Wei-Hsiang Liao, György Fazekas, Yuki Mitsufuji

    Abstract: This study introduces a novel and interpretable model, DiffVox, for matching vocal effects in music production. DiffVox, short for ``Differentiable Vocal Fx", integrates parametric equalisation, dynamic range control, delay, and reverb with efficient differentiable implementations to enable gradient-based optimisation for parameter estimation. Vocal presets are retrieved from two datasets, compris… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: Submitted to DAFx 2025

  5. arXiv:2501.17578  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding

    Authors: Marco Pasini, Stefan Lattner, George Fazekas

    Abstract: Efficiently compressing high-dimensional audio signals into a compact and informative latent space is crucial for various tasks, including generative modeling and music information retrieval (MIR). Existing audio autoencoders, however, often struggle to achieve high compression ratios while preserving audio fidelity and facilitating efficient downstream applications. We introduce Music2Latent2, a… ▽ More

    Submitted 29 January, 2025; originally announced January 2025.

    Comments: Accepted to ICASSP 2025

  6. arXiv:2501.10222  [pdf, other

    cs.SD eess.AS

    Towards An Integrated Approach for Expressive Piano Performance Synthesis from Music Scores

    Authors: Jingjing Tang, Erica Cooper, Xin Wang, Junichi Yamagishi, George Fazekas

    Abstract: This paper presents an integrated system that transforms symbolic music scores into expressive piano performance audio. By combining a Transformer-based Expressive Performance Rendering (EPR) model with a fine-tuned neural MIDI synthesiser, our approach directly generates expressive audio performances from score inputs. To the best of our knowledge, this is the first system to offer a streamlined… ▽ More

    Submitted 17 January, 2025; originally announced January 2025.

    Comments: Accepted by ICASSP 2025

  7. arXiv:2412.18955  [pdf, other

    cs.SD eess.AS

    Leave-One-EquiVariant: Alleviating invariance-related information loss in contrastive music representations

    Authors: Julien Guinot, Elio Quinton, György Fazekas

    Abstract: Contrastive learning has proven effective in self-supervised musical representation learning, particularly for Music Information Retrieval (MIR) tasks. However, reliance on augmentation chains for contrastive view generation and the resulting learnt invariances pose challenges when different downstream tasks require sensitivity to certain musical attributes. To address this, we propose the Leave O… ▽ More

    Submitted 25 December, 2024; originally announced December 2024.

  8. arXiv:2412.10968  [pdf, ps, other

    cs.SD cs.AI cs.HC cs.LG eess.AS

    Composers' Evaluations of an AI Music Tool: Insights for Human-Centred Design

    Authors: Eleanor Row, György Fazekas

    Abstract: We present a study that explores the role of user-centred design in developing Generative AI (GenAI) tools for music composition. Through semi-structured interviews with professional composers, we gathered insights on a novel generative model for creating variations, highlighting concerns around trust, transparency, and ethical design. The findings helped form a feedback loop, guiding improvements… ▽ More

    Submitted 14 December, 2024; originally announced December 2024.

    Comments: Accepted to NeurIPS 2024 Workshop on Generative AI and Creativity: A dialogue between machine learning researchers and creative professionals in Vancouver, Canada

  9. arXiv:2412.04610  [pdf, other

    cs.SD cs.LG eess.AS

    Exploring Transformer-Based Music Overpainting for Jazz Piano Variations

    Authors: Eleanor Row, Ivan Shanin, György Fazekas

    Abstract: This paper explores transformer-based models for music overpainting, focusing on jazz piano variations. Music overpainting generates new variations while preserving the melodic and harmonic structure of the input. Existing approaches are limited by small datasets, restricting scalability and diversity. We introduce VAR4000, a subset of a larger dataset for jazz piano performances, consisting of 4,… ▽ More

    Submitted 5 December, 2024; originally announced December 2024.

    Comments: Accepted and presented as a Late-Breaking Demo at the 25th International Society for Music Information Retrieval (ISMIR) in San Francisco, US, 2024

  10. arXiv:2411.18447  [pdf, other

    cs.LG cs.AI cs.SD eess.AS

    Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation

    Authors: Marco Pasini, Javier Nistal, Stefan Lattner, George Fazekas

    Abstract: Autoregressive models are typically applied to sequences of discrete tokens, but recent research indicates that generating sequences of continuous embeddings in an autoregressive manner is also feasible. However, such Continuous Autoregressive Models (CAMs) can suffer from a decline in generation quality over extended sequences due to error accumulation during inference. We introduce a novel metho… ▽ More

    Submitted 27 November, 2024; originally announced November 2024.

    Comments: Accepted to NeurIPS 2024 - Audio Imagination Workshop

  11. arXiv:2411.16405  [pdf, other

    cs.CV cs.AI eess.IV

    Synthesising Handwritten Music with GANs: A Comprehensive Evaluation of CycleWGAN, ProGAN, and DCGAN

    Authors: Elona Shatri, Kalikidhar Palavala, George Fazekas

    Abstract: The generation of handwritten music sheets is a crucial step toward enhancing Optical Music Recognition (OMR) systems, which rely on large and diverse datasets for optimal performance. However, handwritten music sheets, often found in archives, present challenges for digitisation due to their fragility, varied handwriting styles, and image quality. This paper addresses the data scarcity problem by… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

    Comments: 10 pages, one page references, to appear on the IEEE Big Data 2024 2nd Workshop on AI Music Generation (AIMG 2024)

  12. arXiv:2411.06576  [pdf, other

    eess.AS cs.SD

    Diff-MSTC: A Mixing Style Transfer Prototype for Cubase

    Authors: Soumya Sai Vanka, Lennart Hannink, Jean-Baptiste Rolland, George Fazekas

    Abstract: In our demo, participants are invited to explore the Diff-MSTC prototype, which integrates the Diff-MST model into Steinberg's digital audio workstation (DAW), Cubase. Diff-MST, a deep learning model for mixing style transfer, forecasts mixing console parameters for tracks using a reference song. The system processes up to 20 raw tracks along with a reference song to predict mixing console paramet… ▽ More

    Submitted 10 November, 2024; originally announced November 2024.

    Comments: Presented at 2024 International Society for Music Information Retrieval

  13. arXiv:2409.07918  [pdf

    cs.HC cs.AI cs.SD eess.AS

    Tidal MerzA: Combining affective modelling and autonomous code generation through Reinforcement Learning

    Authors: Elizabeth Wilson, György Fazekas, Geraint Wiggins

    Abstract: This paper presents Tidal-MerzA, a novel system designed for collaborative performances between humans and a machine agent in the context of live coding, specifically focusing on the generation of musical patterns. Tidal-MerzA fuses two foundational models: ALCAA (Affective Live Coding Autonomous Agent) and Tidal Fuzz, a computational framework. By integrating affective modelling with computationa… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  14. arXiv:2408.14340  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Foundation Models for Music: A Survey

    Authors: Yinghao Ma, Anders Øland, Anton Ragni, Bleiz MacSen Del Sette, Charalampos Saitis, Chris Donahue, Chenghua Lin, Christos Plachouras, Emmanouil Benetos, Elona Shatri, Fabio Morreale, Ge Zhang, György Fazekas, Gus Xia, Huan Zhang, Ilaria Manco, Jiawen Huang, Julien Guinot, Liwei Lin, Luca Marinelli, Max W. Y. Lam, Megha Sharma, Qiuqiang Kong, Roger B. Dannenberg, Ruibin Yuan , et al. (17 additional authors not shown)

    Abstract: In recent years, foundation models (FMs) such as large language models (LLMs) and latent diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music, spanning from representation learning, generative learning and multimodal learning. We first contextualise the signifi… ▽ More

    Submitted 3 September, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  15. arXiv:2408.06500  [pdf, other

    cs.SD cs.LG eess.AS

    Music2Latent: Consistency Autoencoders for Latent Audio Compression

    Authors: Marco Pasini, Stefan Lattner, George Fazekas

    Abstract: Efficient audio representations in a compressed continuous latent space are critical for generative audio modeling and Music Information Retrieval (MIR) tasks. However, some existing audio autoencoders have limitations, such as multi-stage training procedures, slow iterative sampling, or low reconstruction quality. We introduce Music2Latent, an audio autoencoder that overcomes these limitations by… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted to ISMIR 2024

  16. arXiv:2408.01337  [pdf, other

    cs.SD cs.CL cs.LG cs.MM eess.AS

    MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models

    Authors: Benno Weck, Ilaria Manco, Emmanouil Benetos, Elio Quinton, George Fazekas, Dmitry Bogdanov

    Abstract: Multimodal models that jointly process audio and language hold great promise in audio understanding and are increasingly being adopted in the music domain. By allowing users to query via text and obtain information about a given audio input, these models have the potential to enable a variety of music understanding tasks via language-based interfaces. However, their evaluation poses considerable c… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: Accepted at ISMIR 2024. Data: https://doi.org/10.5281/zenodo.12709974 Code: https://github.com/mulab-mir/muchomusic Supplementary material: https://mulab-mir.github.io/muchomusic

  17. arXiv:2407.13840  [pdf, other

    eess.AS

    Semi-Supervised Contrastive Learning of Musical Representations

    Authors: Julien Guinot, Elio Quinton, György Fazekas

    Abstract: Despite the success of contrastive learning in Music Information Retrieval, the inherent ambiguity of contrastive self-supervision presents a challenge. Relying solely on augmentation chains and self-supervised positive sampling strategies can lead to a pretraining objective that does not capture key musical information for downstream tasks. We introduce semi-supervised contrastive learning (SemiS… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to be published at the Proceedings of the 25th International Society for Music Information Retrieval Conference 2024, Includes non-proceedings appendix

  18. arXiv:2407.08889  [pdf, other

    eess.AS cs.SD

    Diff-MST: Differentiable Mixing Style Transfer

    Authors: Soumya Sai Vanka, Christian Steinmetz, Jean-Baptiste Rolland, Joshua Reiss, George Fazekas

    Abstract: Mixing style transfer automates the generation of a multitrack mix for a given set of tracks by inferring production attributes from a reference song. However, existing systems for mixing style transfer are limited in that they often operate only on a fixed number of tracks, introduce artifacts, and produce mixes in an end-to-end fashion, without grounding in traditional audio effects, prohibiting… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted to be published at the Proceedings of the 25th International Society for Music Information Retrieval Conference 2024

  19. Differentiable Time-Varying Linear Prediction in the Context of End-to-End Analysis-by-Synthesis

    Authors: Chin-Yun Yu, György Fazekas

    Abstract: Training the linear prediction (LP) operator end-to-end for audio synthesis in modern deep learning frameworks is slow due to its recursive formulation. In addition, frame-wise approximation as an acceleration method cannot generalise well to test time conditions where the LP is computed sample-wise. Efficient differentiable sample-wise LP for end-to-end training is the key to removing this barrie… ▽ More

    Submitted 18 October, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

    Comments: Published at Interspeech 2024

  20. arXiv:2405.06804  [pdf, other

    cs.SD eess.AS eess.SP

    Time-of-arrival Estimation and Phase Unwrapping of Head-related Transfer Functions With Integer Linear Programming

    Authors: Chin-Yun Yu, Johan Pauwels, György Fazekas

    Abstract: In binaural audio synthesis, aligning head-related impulse responses (HRIRs) in time has been an important pre-processing step, enabling accurate spatial interpolation and efficient data compression. The maximum correlation time delay between spatially nearby HRIRs has previously been used to get accurate and smooth alignment by solving a matrix equation in which the solution has the minimum Eucli… ▽ More

    Submitted 19 October, 2024; v1 submitted 10 May, 2024; originally announced May 2024.

    Comments: Published at Audio Engineering Society 156th Convention, 2024 June, Madrid, Spain

  21. arXiv:2404.07970  [pdf, other

    eess.AS cs.LG cs.SD

    Differentiable All-pole Filters for Time-varying Audio Systems

    Authors: Chin-Yun Yu, Christopher Mitcheltree, Alistair Carson, Stefan Bilbao, Joshua D. Reiss, György Fazekas

    Abstract: Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-to-end training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous wo… ▽ More

    Submitted 18 October, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: Published at DAFx 2024

  22. arXiv:2311.13058  [pdf, other

    cs.SD eess.AS

    Self-Supervised Music Source Separation Using Vector-Quantized Source Category Estimates

    Authors: Marco Pasini, Stefan Lattner, George Fazekas

    Abstract: Music source separation is focused on extracting distinct sonic elements from composite tracks. Historically, many methods have been grounded in supervised learning, necessitating labeled data, which is occasionally constrained in its diversity. More recent methods have delved into N-shot techniques that utilize one or more audio samples to aid in the separation. However, a challenge with some of… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 4 pages, 2 figures, 1 table; Accepted at the 37th Conference on Neural Information Processing Systems (2023), Machine Learning for Audio Workshop

  23. arXiv:2311.10057  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

    Authors: Ilaria Manco, Benno Weck, SeungHeon Doh, Minz Won, Yixiao Zhang, Dmitry Bogdanov, Yusong Wu, Ke Chen, Philip Tovstogan, Emmanouil Benetos, Elio Quinton, György Fazekas, Juhan Nam

    Abstract: We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models. The dataset consists of 1.1k human-written natural language descriptions of 706 music recordings, all publicly accessible and released under Creative Common licenses. To showcase the use of our dataset, we benchmark popular models o… ▽ More

    Submitted 22 November, 2023; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted to NeurIPS 2023 Workshop on Machine Learning for Audio

  24. arXiv:2311.07345  [pdf, other

    eess.AS cs.SD

    Zero-Shot Duet Singing Voices Separation with Diffusion Models

    Authors: Chin-Yun Yu, Emilian Postolache, Emanuele Rodolà, György Fazekas

    Abstract: In recent studies, diffusion models have shown promise as priors for solving audio inverse problems. These models allow us to sample from the posterior distribution of a target signal given an observed signal by manipulating the diffusion process. However, when separating audio sources of the same type, such as duet singing voices, the prior learned by the diffusion process may not be sufficient t… ▽ More

    Submitted 19 October, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: 9 pages, 1 figure. Published at Sound Demixing Workshop 2023

  25. arXiv:2310.14044  [pdf, other

    cs.SD cs.AI eess.AS

    Composer Style-specific Symbolic Music Generation Using Vector Quantized Discrete Diffusion Models

    Authors: Jincheng Zhang, György Fazekas, Charalampos Saitis

    Abstract: Emerging Denoising Diffusion Probabilistic Models (DDPM) have become increasingly utilised because of promising results they have achieved in diverse generative tasks with continuous data, such as image and sound synthesis. Nonetheless, the success of diffusion models has not been fully extended to discrete symbolic music. We propose to combine a vector quantized variational autoencoder (VQ-VAE) a… ▽ More

    Submitted 3 September, 2024; v1 submitted 21 October, 2023; originally announced October 2023.

  26. arXiv:2310.14040  [pdf, other

    cs.SD cs.AI eess.AS

    Fast Diffusion GAN Model for Symbolic Music Generation Controlled by Emotions

    Authors: Jincheng Zhang, György Fazekas, Charalampos Saitis

    Abstract: Diffusion models have shown promising results for a wide range of generative tasks with continuous data, such as image and audio synthesis. However, little progress has been made on using diffusion models to generate discrete symbolic music because this new class of generative models are not well suited for discrete data while its iterative sampling process is computationally expensive. In this wo… ▽ More

    Submitted 21 October, 2023; originally announced October 2023.

  27. arXiv:2310.00699  [pdf, other

    cs.SD eess.AS

    Pianist Identification Using Convolutional Neural Networks

    Authors: Jingjing Tang, Geraint Wiggins, Gyorgy Fazekas

    Abstract: This paper presents a comprehensive study of automatic performer identification in expressive piano performances using convolutional neural networks (CNNs) and expressive features. Our work addresses the challenging multi-class classification task of identifying virtuoso pianists, which has substantial implications for building dynamic musical instruments with intelligence and smart musical system… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

    Comments: 6 pages, 3 figures, accepted by the 4th International Symposium on the Internet of Sounds, IS2 2023

  28. arXiv:2309.03404  [pdf, other

    cs.HC cs.AI eess.AS

    The Role of Communication and Reference Songs in the Mixing Process: Insights from Professional Mix Engineers

    Authors: Soumya Sai Vanka, Maryam Safi, Jean-Baptiste Rolland, György Fazekas

    Abstract: Effective music mixing requires technical and creative finesse, but clear communication with the client is crucial. The mixing engineer must grasp the client's expectations, and preferences, and collaborate to achieve the desired sound. The tacit agreement for the desired sound of the mix is often established using guides like reference songs and demo mixes exchanged between the artist and the eng… ▽ More

    Submitted 29 September, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

  29. arXiv:2308.16177  [pdf, other

    cs.SD eess.AS

    General Purpose Audio Effect Removal

    Authors: Matthew Rice, Christian J. Steinmetz, George Fazekas, Joshua D. Reiss

    Abstract: Although the design and application of audio effects is well understood, the inverse problem of removing these effects is significantly more challenging and far less studied. Recently, deep learning has been applied to audio effect removal; however, existing approaches have focused on narrow formulations considering only one effect or source type at a time. In realistic scenarios, multiple effects… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: Preprint. Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2023

  30. arXiv:2308.15422  [pdf, other

    cs.SD eess.AS

    A Review of Differentiable Digital Signal Processing for Music & Speech Synthesis

    Authors: Ben Hayes, Jordie Shier, György Fazekas, Andrew McPherson, Charalampos Saitis

    Abstract: The term "differentiable digital signal processing" describes a family of techniques in which loss function gradients are backpropagated through digital signal processors, facilitating their integration into neural networks. This article surveys the literature on differentiable audio signal processing, focusing on its use in music & speech synthesis. We catalogue applications to tasks including mu… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: Under review for Frontiers in Signal Processing

  31. arXiv:2307.09670  [pdf, other

    cs.SD cs.LG eess.AS

    JAZZVAR: A Dataset of Variations found within Solo Piano Performances of Jazz Standards for Music Overpainting

    Authors: Eleanor Row, Jingjing Tang, George Fazekas

    Abstract: Jazz pianists often uniquely interpret jazz standards. Passages from these interpretations can be viewed as sections of variation. We manually extracted such variations from solo jazz piano performances. The JAZZVAR dataset is a collection of 502 pairs of Variation and Original MIDI segments. Each Variation in the dataset is accompanied by a corresponding Original segment containing the melody and… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: Pre-print accepted for publication at CMMR2023, 12 pages, 4 figures

  32. Singing Voice Synthesis Using Differentiable LPC and Glottal-Flow-Inspired Wavetables

    Authors: Chin-Yun Yu, György Fazekas

    Abstract: This paper introduces GlOttal-flow LPC Filter (GOLF), a novel method for singing voice synthesis (SVS) that exploits the physical characteristics of the human voice using differentiable digital signal processing. GOLF employs a glottal model as the harmonic source and IIR filters to simulate the vocal tract, resulting in an interpretable and efficient approach. We show it is competitive with state… ▽ More

    Submitted 18 October, 2024; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: 9 pages, 4 figures. Published at ISMIR 2023

  33. arXiv:2306.06040  [pdf, other

    cs.SD cs.LG eess.AS

    Reconstructing Human Expressiveness in Piano Performances with a Transformer Network

    Authors: Jingjing Tang, Geraint Wiggins, Gyorgy Fazekas

    Abstract: Capturing intricate and subtle variations in human expressiveness in music performance using computational approaches is challenging. In this paper, we propose a novel approach for reconstructing human expressiveness in piano performance with a multi-layer bi-directional Transformer encoder. To address the needs for large amounts of accurately captured and score-aligned performance data in trainin… ▽ More

    Submitted 1 October, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: 12 pages, 5 figures, accepted by CMMR2023, the 16th International Symposium on Computer Music Multidisciplinary Research

  34. arXiv:2304.03407  [pdf, other

    cs.HC cs.AI cs.SD eess.AS

    Adoption of AI Technology in the Music Mixing Workflow: An Investigation

    Authors: Soumya Sai Vanka, Maryam Safi, Jean-Baptiste Rolland, George Fazekas

    Abstract: The integration of artificial intelligence (AI) technology in the music industry is driving a significant change in the way music is being composed, produced and mixed. This study investigates the current state of AI in the mixing workflows and its adoption by different user groups. Through semi-structured interviews, a questionnaire-based study, and analyzing web forums, the study confirms three… ▽ More

    Submitted 8 September, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

    Journal ref: Paper number 10653, 154th AES Convention 2023

  35. arXiv:2301.13383  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    An Comparative Analysis of Different Pitch and Metrical Grid Encoding Methods in the Task of Sequential Music Generation

    Authors: Yuqiang Li, Shengchen Li, George Fazekas

    Abstract: Pitch and meter are two fundamental music features for symbolic music generation tasks, where researchers usually choose different encoding methods depending on specific goals. However, the advantages and drawbacks of different encoding methods have not been frequently discussed. This paper presents a integrated analysis of the influence of two low-level feature, pitch and meter, on the performanc… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: This is a draft before submitted to TISMIR as a journal paper

  36. arXiv:2301.10183  [pdf, other

    cs.SD cs.LG eess.AS

    Mesostructures: Beyond Spectrogram Loss in Differentiable Time-Frequency Analysis

    Authors: Cyrus Vahidi, Han Han, Changhong Wang, Mathieu Lagrange, György Fazekas, Vincent Lostanlen

    Abstract: Computer musicians refer to mesostructures as the intermediate levels of articulation between the microstructure of waveshapes and the macrostructure of musical forms. Examples of mesostructures include melody, arpeggios, syncopation, polyphonic grouping, and textural contrast. Despite their central role in musical expression, they have received limited attention in deep learning. Currently, autoe… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

  37. Conditioning and Sampling in Variational Diffusion Models for Speech Super-Resolution

    Authors: Chin-Yun Yu, Sung-Lin Yeh, György Fazekas, Hao Tang

    Abstract: Recently, diffusion models (DMs) have been increasingly used in audio processing tasks, including speech super-resolution (SR), which aims to restore high-frequency content given low-resolution speech utterances. This is commonly achieved by conditioning the network of noise predictor with low-resolution audio. In this paper, we propose a novel sampling algorithm that communicates the information… ▽ More

    Submitted 19 October, 2024; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: Published at ICASSP 2023

  38. arXiv:2210.15306  [pdf, other

    cs.SD cs.LG eess.AS

    Rigid-Body Sound Synthesis with Differentiable Modal Resonators

    Authors: Rodrigo Diaz, Ben Hayes, Charalampos Saitis, György Fazekas, Mark Sandler

    Abstract: Physical models of rigid bodies are used for sound synthesis in applications from virtual environments to music production. Traditional methods such as modal synthesis often rely on computationally expensive numerical solvers, while recent deep learning approaches are limited by post-processing of their results. In this work we present a novel end-to-end framework for training a deep neural networ… ▽ More

    Submitted 28 October, 2022; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: 5 pages

  39. arXiv:2210.14476  [pdf, other

    eess.SP cs.SD eess.AS

    Sinusoidal Frequency Estimation by Gradient Descent

    Authors: Ben Hayes, Charalampos Saitis, György Fazekas

    Abstract: Sinusoidal parameter estimation is a fundamental task in applications from spectral analysis to time-series forecasting. Estimating the sinusoidal frequency parameter by gradient descent is, however, often impossible as the error function is non-convex and densely populated with local minima. The growing family of differentiable signal processing methods has therefore been unable to tune the frequ… ▽ More

    Submitted 18 November, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Submitted to ICASSP 2023

  40. arXiv:2208.12208  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Contrastive Audio-Language Learning for Music

    Authors: Ilaria Manco, Emmanouil Benetos, Elio Quinton, György Fazekas

    Abstract: As one of the most intuitive interfaces known to humans, natural language has the potential to mediate many tasks that involve human-computer interaction, especially in application-focused fields like Music Information Retrieval. In this work, we explore cross-modal learning in an attempt to bridge audio and language in the music domain. To this end, we propose MusCALL, a framework for Music Contr… ▽ More

    Submitted 25 August, 2022; originally announced August 2022.

    Comments: Accepted to ISMIR 2022

  41. arXiv:2205.08866  [pdf, other

    cs.MM cs.SD eess.AS

    Seeing Sounds, Hearing Shapes: a gamified study to evaluate sound-sketches

    Authors: Sebastian Löbbers, György Fazekas

    Abstract: Sound-shape associations, a subset of cross-modal associations between the auditory and visual domain, have been studied mainly in the context of matching a set of purposefully crafted shapes to sounds. Recent studies have explored how humans represent sound through free-form sketching and how a graphical sketch input could be used for sound production. In this paper, the potential of communicatin… ▽ More

    Submitted 18 May, 2022; originally announced May 2022.

    Comments: Accepted at International Computer Music Conference (ICMC) 2022

  42. arXiv:2204.08269  [pdf, other

    cs.SD cs.LG eess.AS

    Differentiable Time-Frequency Scattering on GPU

    Authors: John Muradeli, Cyrus Vahidi, Changhong Wang, Han Han, Vincent Lostanlen, Mathieu Lagrange, George Fazekas

    Abstract: Joint time-frequency scattering (JTFS) is a convolutional operator in the time-frequency domain which extracts spectrotemporal modulations at various rates and scales. It offers an idealized model of spectrotemporal receptive fields (STRF) in the primary auditory cortex, and thus may serve as a biological plausible surrogate for human perceptual judgments at the scale of isolated audio events. Yet… ▽ More

    Submitted 19 July, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

    Comments: 8 pages, 6 figures. Submitted to the International Conference on Digital Audio Effects (DAFX) 2022

  43. arXiv:2112.04214  [pdf, other

    cs.SD cs.CL cs.IR cs.LG eess.AS

    Learning music audio representations via weak language supervision

    Authors: Ilaria Manco, Emmanouil Benetos, Elio Quinton, Gyorgy Fazekas

    Abstract: Audio representations for music information retrieval are typically learned via supervised learning in a task-specific fashion. Although effective at producing state-of-the-art results, this scheme lacks flexibility with respect to the range of applications a model can have and requires extensively annotated datasets. In this work, we pose the question of whether it may be possible to exploit weak… ▽ More

    Submitted 17 February, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: Accepted to ICASSP 2022

  44. arXiv:2111.08839  [pdf, other

    cs.SD eess.AS

    Zero-shot Singing Technique Conversion

    Authors: Brendan O'Connor, Simon Dixon, George Fazekas

    Abstract: In this paper we propose modifications to the neural network framework, AutoVC for the task of singing technique conversion. This includes utilising a pretrained singing technique encoder which extracts technique information, upon which a decoder is conditioned during training. By swapping out a source singer's technique information for that of the target's during conversion, the input spectrogram… ▽ More

    Submitted 16 November, 2021; originally announced November 2021.

    Comments: In Proceedings of the 15th International Symposium on Computer Music Multidisciplinary Research (CMMR 2021), Tokyo, Japan, November 15-16, 2021

  45. An Exploratory Study on Perceptual Spaces of the Singing Voice

    Authors: Brendan O'Connor, Simon Dixon, George Fazekas

    Abstract: Sixty participants provided dissimilarity ratings between various singing techniques. Multidimensional scaling, class averaging and clustering techniques were used to analyse timbral spaces and how they change between different singers, genders and registers. Clustering analysis showed that ground-truth similarity and silhouette scores that were not significantly different between gender or regist… ▽ More

    Submitted 15 November, 2021; originally announced November 2021.

    Comments: In Proceedings of the 2020 Joint Conference on AI Music Creativity (CSMC-MuMe 2020), Stockholm, Sweden, October 15-19, 2020

  46. arXiv:2108.02576  [pdf, other

    cs.SD cs.IR cs.LG eess.AS

    Performer Identification From Symbolic Representation of Music Using Statistical Models

    Authors: Syed Rifat Mahmud Rafee, Gyorgy Fazekas, Geraint A. ~Wiggins

    Abstract: Music Performers have their own idiosyncratic way of interpreting a musical piece. A group of skilled performers playing the same piece of music would likely to inject their unique artistic styles in their performances. The variations of the tempo, timing, dynamics, articulation etc. from the actual notated music are what make the performers unique in their performances. This study presents a data… ▽ More

    Submitted 5 August, 2021; originally announced August 2021.

  47. arXiv:2107.07360  [pdf, other

    cs.MM cs.HC cs.SD eess.AS

    Sketching sounds: an exploratory study on sound-shape associations

    Authors: Sebastian Löbbers, Mathieu Barthet, György Fazekas

    Abstract: Sound synthesiser controls typically correspond to technical parameters of signal processing algorithms rather than intuitive sound descriptors that relate to human perception of sound. This makes it difficult to realise sound ideas in a straightforward way. Cross-modal mappings, for example between gestures and sound, have been suggested as a more intuitive control mechanism. A large body of rese… ▽ More

    Submitted 15 July, 2021; originally announced July 2021.

    Comments: accepted for International Computer Music Conference (ICMC) 2021

  48. arXiv:2107.05050  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    Neural Waveshaping Synthesis

    Authors: Ben Hayes, Charalampos Saitis, György Fazekas

    Abstract: We present the Neural Waveshaping Unit (NEWT): a novel, lightweight, fully causal approach to neural audio synthesis which operates directly in the waveform domain, with an accompanying optimisation (FastNEWT) for efficient CPU inference. The NEWT uses time-distributed multilayer perceptrons with periodic activations to implicitly learn nonlinear transfer functions that encode the characteristics… ▽ More

    Submitted 27 July, 2021; v1 submitted 11 July, 2021; originally announced July 2021.

    Comments: Accepted to ISMIR 2021; See online supplement at https://benhayes.net/projects/nws/

  49. arXiv:2105.11836  [pdf, other

    cs.SD cs.LG eess.AS

    A Modulation Front-End for Music Audio Tagging

    Authors: Cyrus Vahidi, Charalampos Saitis, György Fazekas

    Abstract: Convolutional Neural Networks have been extensively explored in the task of automatic music tagging. The problem can be approached by using either engineered time-frequency features or raw audio as input. Modulation filter bank representations that have been actively researched as a basis for timbre perception have the potential to facilitate the extraction of perceptually salient features. We exp… ▽ More

    Submitted 25 May, 2021; originally announced May 2021.

  50. arXiv:2104.11984  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    MusCaps: Generating Captions for Music Audio

    Authors: Ilaria Manco, Emmanouil Benetos, Elio Quinton, Gyorgy Fazekas

    Abstract: Content-based music information retrieval has seen rapid progress with the adoption of deep learning. Current approaches to high-level music description typically make use of classification models, such as in auto-tagging or genre and mood classification. In this work, we propose to address music description via audio captioning, defined as the task of generating a natural language description of… ▽ More

    Submitted 24 April, 2021; originally announced April 2021.

    Comments: Accepted to IJCNN 2021 for the Special Session on Representation Learning for Audio, Speech, and Music Processing