Skip to main content

Showing 1–23 of 23 results for author: Sandler, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.15514  [pdf, ps, other

    cs.SD eess.AS

    Exploiting Music Source Separation for Automatic Lyrics Transcription with Whisper

    Authors: Jaza Syed, Ivan Meresman Higgs, Ondřej Cífka, Mark Sandler

    Abstract: Automatic lyrics transcription (ALT) remains a challenging task in the field of music information retrieval, despite great advances in automatic speech recognition (ASR) brought about by transformer-based architectures in recent years. One of the major challenges in ALT is the high amplitude of interfering audio signals relative to conventional ASR due to musical accompaniment. Recent advances in… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: Accepted at 2025 ICME Workshop AI for Music

  2. arXiv:2505.05940  [pdf, other

    cs.SD cs.LG eess.AS physics.comp-ph

    Fast Differentiable Modal Simulation of Non-linear Strings, Membranes, and Plates

    Authors: Rodrigo Diaz, Mark Sandler

    Abstract: Modal methods for simulating vibrations of strings, membranes, and plates are widely used in acoustics and physically informed audio synthesis. However, traditional implementations, particularly for non-linear models like the von Kármán plate, are computationally demanding and lack differentiability, limiting inverse modelling and real-time applications. We introduce a fast, differentiable, GPU-ac… ▽ More

    Submitted 26 May, 2025; v1 submitted 9 May, 2025; originally announced May 2025.

    Comments: accepted to DAFx 2025

  3. arXiv:2503.11562  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Designing Neural Synthesizers for Low-Latency Interaction

    Authors: Franco Caspe, Jordie Shier, Mark Sandler, Charalampos Saitis, Andrew McPherson

    Abstract: Neural Audio Synthesis (NAS) models offer interactive musical control over high-quality, expressive audio generators. While these models can operate in real-time, they often suffer from high latency, making them unsuitable for intimate musical interaction. The impact of architectural choices in deep learning models on audio latency remains largely unexplored in the NAS literature. In this work, we… ▽ More

    Submitted 11 April, 2025; v1 submitted 14 March, 2025; originally announced March 2025.

    Comments: See website at fcaspe.github.io/brave - 13 pages, 5 figures, accepted to the Journal of the Audio Engineering Society, LaTeX; Corrected typos, added hyphen to title to reflect JAES version

  4. arXiv:2408.16650  [pdf, other

    cs.SD cs.LG eess.AS physics.comp-ph

    Towards Efficient Modelling of String Dynamics: A Comparison of State Space and Koopman based Deep Learning Methods

    Authors: Rodrigo Diaz, Carlos De La Vega Martin, Mark Sandler

    Abstract: This paper presents an examination of State Space Models (SSM) and Koopman-based deep learning methods for modelling the dynamics of both linear and non-linear stiff strings. Through experiments with datasets generated under different initial conditions and sample rates, we assess the capacity of these models to accurately model the complex behaviours observed in string dynamics. Our findings indi… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted to DAFx2024

  5. arXiv:2310.04811  [pdf, other

    cs.SD cs.NE eess.AS eess.SY

    FM Tone Transfer with Envelope Learning

    Authors: Franco Caspe, Andrew McPherson, Mark Sandler

    Abstract: Tone Transfer is a novel deep-learning technique for interfacing a sound source with a synthesizer, transforming the timbre of audio excerpts while keeping their musical form content. Due to its good audio quality results and continuous controllability, it has been recently applied in several audio processing tools. Nevertheless, it still presents several shortcomings related to poor sound diversi… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Comments: Accepted to Audio Mostly 2023

  6. arXiv:2309.06649  [pdf, other

    cs.SD eess.AS

    Differentiable Modelling of Percussive Audio with Transient and Spectral Synthesis

    Authors: Jordie Shier, Franco Caspe, Andrew Robertson, Mark Sandler, Charalampos Saitis, Andrew McPherson

    Abstract: Differentiable digital signal processing (DDSP) techniques, including methods for audio synthesis, have gained attention in recent years and lend themselves to interpretability in the parameter space. However, current differentiable synthesis methods have not explicitly sought to model the transient portion of signals, which is important for percussive sounds. In this work, we present a unified sy… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: To be published in The Proceedings of Forum Acusticum, Sep 2023, Turin, Italy

  7. arXiv:2309.02404  [pdf, other

    cs.SD cs.CV eess.AS

    Voice Morphing: Two Identities in One Voice

    Authors: Sushanta K. Pani, Anurag Chowdhury, Morgan Sandler, Arun Ross

    Abstract: In a biometric system, each biometric sample or template is typically associated with a single identity. However, recent research has demonstrated the possibility of generating "morph" biometric samples that can successfully match more than a single identity. Morph attacks are now recognized as a potential security threat to biometric systems. However, most morph attacks have been studied on biome… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Accepted oral paper at BIOSIG 2023

  8. arXiv:2305.14867  [pdf, other

    cs.SD cs.HC eess.AS

    Interactive Neural Resonators

    Authors: Rodrigo Diaz, Charalampos Saitis, Mark Sandler

    Abstract: In this work, we propose a method for the controllable synthesis of real-time contact sounds using neural resonators. Previous works have used physically inspired statistical methods and physical modelling for object materials and excitation signals. Our method incorporates differentiable second-order resonators and estimates their coefficients using a neural network that is conditioned on physica… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  9. arXiv:2305.07997  [pdf, other

    eess.AS cs.SD

    Vocal Style Factorization for Effective Speaker Recognition in Affective Scenarios

    Authors: Morgan Sandler, Arun Ross

    Abstract: The accuracy of automated speaker recognition is negatively impacted by change in emotions in a person's speech. In this paper, we hypothesize that speaker identity is composed of various vocal style factors that may be learned from unlabeled data and re-combined using a neural network to generate a holistic speaker identity representation for affective scenarios. In this regard, we propose the E-… ▽ More

    Submitted 3 August, 2023; v1 submitted 13 May, 2023; originally announced May 2023.

    Comments: Proceedings of the IEEE 2023 International Joint Conference on Biometrics (IJCB)

  10. arXiv:2211.08213  [pdf, other

    eess.AS cs.SD

    Is Style All You Need? Dependencies Between Emotion and GST-based Speaker Recognition

    Authors: Morgan Sandler, Arun Ross

    Abstract: In this work, we study the hypothesis that speaker identity embeddings extracted from speech samples may be used for detection and classification of emotion. In particular, we show that emotions can be effectively identified by learning speaker identities by use of a 1-D Triplet Convolutional Neural Network (CNN) & Global Style Token (GST) scheme (e.g., DeepTalk Network) and reusing the trained sp… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  11. arXiv:2210.15306  [pdf, other

    cs.SD cs.LG eess.AS

    Rigid-Body Sound Synthesis with Differentiable Modal Resonators

    Authors: Rodrigo Diaz, Ben Hayes, Charalampos Saitis, György Fazekas, Mark Sandler

    Abstract: Physical models of rigid bodies are used for sound synthesis in applications from virtual environments to music production. Traditional methods such as modal synthesis often rely on computationally expensive numerical solvers, while recent deep learning approaches are limited by post-processing of their results. In this work we present a novel end-to-end framework for training a deep neural networ… ▽ More

    Submitted 28 October, 2022; v1 submitted 27 October, 2022; originally announced October 2022.

    Comments: 5 pages

  12. arXiv:2208.06169  [pdf, other

    cs.SD cs.LG eess.AS eess.SP

    DDX7: Differentiable FM Synthesis of Musical Instrument Sounds

    Authors: Franco Caspe, Andrew McPherson, Mark Sandler

    Abstract: FM Synthesis is a well-known algorithm used to generate complex timbre from a compact set of design primitives. Typically featuring a MIDI interface, it is usually impractical to control it from an audio source. On the other hand, Differentiable Digital Signal Processing (DDSP) has enabled nuanced audio rendering by Deep Neural Networks (DNNs) that learn to control differentiable synthesis layers… ▽ More

    Submitted 12 August, 2022; originally announced August 2022.

    Comments: Accepted to ISMIR 2022. See online supplement at https://fcaspe.github.io/ddx7/

    ACM Class: H.5.5; I.2.6

  13. arXiv:2204.04651  [pdf, other

    cs.SD cs.IR eess.AS

    Deep Conditional Representation Learning for Drum Sample Retrieval by Vocalisation

    Authors: Alejandro Delgado, Charalampos Saitis, Emmanouil Benetos, Mark Sandler

    Abstract: Imitating musical instruments with the human voice is an efficient way of communicating ideas between music producers, from sketching melody lines to clarifying desired sonorities. For this reason, there is an increasing interest in building applications that allow artists to efficiently pick target samples from big sound libraries just by imitating them vocally. In this study, we investigated the… ▽ More

    Submitted 10 April, 2022; originally announced April 2022.

    Comments: Submitted to Interspeech 2022 (under review)

  14. arXiv:2204.04646  [pdf, other

    cs.SD cs.IR eess.AS

    Deep Embeddings for Robust User-Based Amateur Vocal Percussion Classification

    Authors: Alejandro Delgado, Emir Demirel, Vinod Subramanian, Charalampos Saitis, Mark Sandler

    Abstract: Vocal Percussion Transcription (VPT) is concerned with the automatic detection and classification of vocal percussion sound events, allowing music creators and producers to sketch drum lines on the fly. Classifier algorithms in VPT systems learn best from small user-specific datasets, which usually restrict modelling to small input feature sets to avoid data overfitting. This study explores severa… ▽ More

    Submitted 10 April, 2022; originally announced April 2022.

    Comments: Accepted at Sound and Music Computing (SMC) conference 2022

  15. arXiv:2110.09223  [pdf, other

    cs.SD eess.AS

    Learning Models for Query by Vocal Percussion: A Comparative Study

    Authors: Alejandro Delgado, SkoT McDonald, Ning Xu, Charalampos Saitis, Mark Sandler

    Abstract: The imitation of percussive sounds via the human voice is a natural and effective tool for communicating rhythmic ideas on the fly. Thus, the automatic retrieval of drum sounds using vocal percussion can help artists prototype drum patterns in a comfortable and quick way, smoothing the creative workflow as a result. Here we explore different strategies to perform this type of query, making use of… ▽ More

    Submitted 18 October, 2021; originally announced October 2021.

    Comments: Published in proceedings of the International Computer Music Conference (ICMC) 2021

  16. Transfer Learning for Piano Sustain-Pedal Detection

    Authors: Beici Liang, György Fazekas, Mark Sandler

    Abstract: Detecting piano pedalling techniques in polyphonic music remains a challenging task in music information retrieval. While other piano-related tasks, such as pitch estimation and onset detection, have seen improvement through applying deep learning methods, little work has been done to develop deep learning models to detect playing techniques. In this paper, we propose a transfer learning approach… ▽ More

    Submitted 24 March, 2021; originally announced March 2021.

    Comments: Published in 2019 International Joint Conference on Neural Networks (IJCNN)

  17. A New Dataset for Amateur Vocal Percussion Analysis

    Authors: Alejandro Delgado, SKoT McDonald, Ning Xu, Mark Sandler

    Abstract: The imitation of percussive instruments via the human voice is a natural way for us to communicate rhythmic ideas and, for this reason, it attracts the interest of music makers. Specifically, the automatic mapping of these vocal imitations to their emulated instruments would allow creators to realistically prototype rhythms in a faster way. The contribution of this study is two-fold. Firstly, a ne… ▽ More

    Submitted 24 September, 2020; originally announced September 2020.

  18. arXiv:2001.06086  [pdf, other

    cs.AI cs.IR cs.LG cs.SD eess.AS

    A Critical Look at the Applicability of Markov Logic Networks for Music Signal Analysis

    Authors: Johan Pauwels, György Fazekas, Mark B. Sandler

    Abstract: In recent years, Markov logic networks (MLNs) have been proposed as a potentially useful paradigm for music signal analysis. Because all hidden Markov models can be reformulated as MLNs, the latter can provide an all-encompassing framework that reuses and extends previous work in the field. However, just because it is theoretically possible to reformulate previous work as MLNs, does not mean that… ▽ More

    Submitted 16 January, 2020; originally announced January 2020.

    Comments: Accepted for presentation at the Ninth International Workshop on Statistical Relational AI (StarAI 2020) at the 34th AAAI Conference on Artificial Intelligence (AAAI) in New York, on February 7th 2020

  19. arXiv:1907.02477  [pdf, other

    cs.LG cs.CR cs.SD eess.AS

    Adversarial Attacks in Sound Event Classification

    Authors: Vinod Subramanian, Emmanouil Benetos, Ning Xu, SKoT McDonald, Mark Sandler

    Abstract: Adversarial attacks refer to a set of methods that perturb the input to a classification model in order to fool the classifier. In this paper we apply different gradient based adversarial attack algorithms on five deep learning models trained for sound event classification. Four of the models use mel-spectrogram input and one model uses raw audio input. The models represent standard architectures… ▽ More

    Submitted 15 August, 2019; v1 submitted 4 July, 2019; originally announced July 2019.

    Comments: Fixed Freesound data reference to FSDKaggle2018

  20. arXiv:1903.01976  [pdf, other

    cs.SD eess.AS

    Spectral Visibility Graphs: Application to Similarity of Harmonic Signals

    Authors: Delia Fano Yela, Dan Stowell, Mark Sandler

    Abstract: Graph theory is emerging as a new source of tools for time series analysis. One promising method is to transform a signal into its visibility graph, a representation which captures many interesting aspects of the signal. Here we introduce the visibility graph for audio spectra and propose a novel representation for audio analysis: the spectral visibility graph degree. Such representation inherentl… ▽ More

    Submitted 20 June, 2019; v1 submitted 5 March, 2019; originally announced March 2019.

    Comments: European Signal Processing Conference (EUSIPCO)

  21. arXiv:1804.02325  [pdf, other

    cs.SD eess.AS

    Does k Matter? k-NN Hubness Analysis for Kernel Additive Modelling Vocal Separation

    Authors: Delia Fano Yela, Dan Stowell, Mark Sandler

    Abstract: Kernel Additive Modelling (KAM) is a framework for source separation aiming to explicitly model inherent properties of sound sources to help with their identification and separation. KAM separates a given source by applying robust statistics on the selection of time-frequency bins obtained through a source-specific kernel, typically the k-NN function. Even though the parameter k appears to be key… ▽ More

    Submitted 6 April, 2018; originally announced April 2018.

    Comments: LVA-ICA 2018 - Feedback always welcome

  22. arXiv:1802.05178  [pdf, other

    cs.MM cs.SD eess.AS

    Similarity measures for vocal-based drum sample retrieval using deep convolutional auto-encoders

    Authors: Adib Mehrabi, Keunwoo Choi, Simon Dixon, Mark Sandler

    Abstract: The expressive nature of the voice provides a powerful medium for communicating sonic ideas, motivating recent research on methods for query by vocalisation. Meanwhile, deep learning methods have demonstrated state-of-the-art results for matching vocal imitations to imitated sounds, yet little is known about how well learned features represent the perceptual similarity between vocalisations and qu… ▽ More

    Submitted 14 February, 2018; originally announced February 2018.

    Comments: ICASSP 2018 camera-ready

  23. arXiv:1711.00351  [pdf, other

    cs.SD eess.AS

    Shift-Invariant Kernel Additive Modelling for Audio Source Separation

    Authors: Delia Fano Yela, Sebastian Ewert, Ken O'Hanlon, Mark B. Sandler

    Abstract: A major goal in blind source separation to identify and separate sources is to model their inherent characteristics. While most state-of-the-art approaches are supervised methods trained on large datasets, interest in non-data-driven approaches such as Kernel Additive Modelling (KAM) remains high due to their interpretability and adaptability. KAM performs the separation of a given source applying… ▽ More

    Submitted 16 February, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

    Comments: Feedback is welcome

    ACM Class: H.5.5; I.5.1; I.5.4

    Journal ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, 2018