Search | arXiv e-print repository

Methods for pitch analysis in contemporary popular music: multiple pitches from harmonic tones in Vitalic's music

Authors: Emmanuel Deruty, David Meredith, Maarten Grachten, Pascal Arbez-Nicolas, Andreas Hasselholt Jørgensen, Oliver Søndermølle Hansen, Magnus Stensli, Christian Nørkær Petersen

Abstract: Aims. This study suggests that the use of multiple perceived pitches arising from a single harmonic complex tone is an active and intentional feature of contemporary popular music. The phenomenon is illustrated through examples drawn from the work of electronic artist Vitalic and others. Methods. Two listening tests were conducted: (1) evaluation of the number of simultaneous pitches perceived f… ▽ More Aims. This study suggests that the use of multiple perceived pitches arising from a single harmonic complex tone is an active and intentional feature of contemporary popular music. The phenomenon is illustrated through examples drawn from the work of electronic artist Vitalic and others. Methods. Two listening tests were conducted: (1) evaluation of the number of simultaneous pitches perceived from single harmonic tones, and (2) manual pitch transcription of sequences of harmonic tones. Relationships between signal characteristics and pitch perception were then analyzed. Results. The synthetic harmonic tones found in the musical sequences under study were observed to transmit more perceived pitches than their acoustic counterparts, with significant variation across listeners. Multiple ambiguous pitches were associated with tone properties such as prominent upper partials and particular autocorrelation profiles. Conclusions. Harmonic tones in a context of contemporary popular music can, in general, convey several ambiguous pitches. The set of perceived pitches depends on both the listener and the listening conditions. △ Less

Submitted 14 June, 2025; originally announced June 2025.

Comments: Pending review, Journal of the Audio Engineering Society

MSC Class: 00A65 ACM Class: J.5

arXiv:2506.07073 [pdf, ps, other]

Insights on Harmonic Tones from a Generative Music Experiment

Authors: Emmanuel Deruty, Maarten Grachten

Abstract: The ultimate purpose of generative music AI is music production. The studio-lab, a social form within the art-science branch of cross-disciplinarity, is a way to advance music production with AI music models. During a studio-lab experiment involving researchers, music producers, and an AI model for music generating bass-like audio, it was observed that the producers used the model's output to conv… ▽ More The ultimate purpose of generative music AI is music production. The studio-lab, a social form within the art-science branch of cross-disciplinarity, is a way to advance music production with AI music models. During a studio-lab experiment involving researchers, music producers, and an AI model for music generating bass-like audio, it was observed that the producers used the model's output to convey two or more pitches with a single harmonic complex tone, which in turn revealed that the model had learned to generate structured and coherent simultaneous melodic lines using monophonic sequences of harmonic complex tones. These findings prompt a reconsideration of the long-standing debate on whether humans can perceive harmonics as distinct pitches and highlight how generative AI can not only enhance musical creativity but also contribute to a deeper understanding of music. △ Less

Submitted 8 June, 2025; originally announced June 2025.

Comments: 15th International Workshop on Machine Learning and Music, September 9, 2024, Vilnius, Lithuania

MSC Class: 68T01 ACM Class: J.5

arXiv:2503.06346 [pdf, other]

Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems

Authors: Maarten Grachten, Javier Nistal

Abstract: Generative systems of musical accompaniments are rapidly growing, yet there are no standardized metrics to evaluate how well generations align with the conditional audio prompt. We introduce a distribution-based measure called "Accompaniment Prompt Adherence" (APA), and validate it through objective experiments on synthetic data perturbations, and human listening tests. Results show that APA align… ▽ More Generative systems of musical accompaniments are rapidly growing, yet there are no standardized metrics to evaluate how well generations align with the conditional audio prompt. We introduce a distribution-based measure called "Accompaniment Prompt Adherence" (APA), and validate it through objective experiments on synthetic data perturbations, and human listening tests. Results show that APA aligns well with human judgments of adherence and is discriminative to transformations that degrade adherence. We release a Python implementation of the metric using the widely adopted pre-trained CLAP embedding model, offering a valuable tool for evaluating and comparing accompaniment generation systems. △ Less

Submitted 8 April, 2025; v1 submitted 8 March, 2025; originally announced March 2025.

Comments: Accepted for publication at the 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025)

arXiv:2406.08384 [pdf, other]

Diff-A-Riff: Musical Accompaniment Co-creation via Latent Diffusion Models

Authors: Javier Nistal, Marco Pasini, Cyran Aouameur, Maarten Grachten, Stefan Lattner

Abstract: Recent advancements in deep generative models present new opportunities for music production but also pose challenges, such as high computational demands and limited audio quality. Moreover, current systems frequently rely solely on text input and typically focus on producing complete musical pieces, which is incompatible with existing workflows in music production. To address these issues, we int… ▽ More Recent advancements in deep generative models present new opportunities for music production but also pose challenges, such as high computational demands and limited audio quality. Moreover, current systems frequently rely solely on text input and typically focus on producing complete musical pieces, which is incompatible with existing workflows in music production. To address these issues, we introduce "Diff-A-Riff," a Latent Diffusion Model designed to generate high-quality instrumental accompaniments adaptable to any musical context. This model offers control through either audio references, text prompts, or both, and produces 48kHz pseudo-stereo audio while significantly reducing inference time and memory usage. We demonstrate the model's capabilities through objective metrics and subjective listening tests, with extensive examples available on the accompanying website: sonycslparis.github.io/diffariff-companion/ △ Less

Submitted 30 October, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 8 pages, 2 figures, 3 tables

Journal ref: Proc. of the 25th International Society for Music Information Retrieval, 2024

arXiv:2404.00775 [pdf, other]

Measuring Audio Prompt Adherence with Distribution-based Embedding Distances

Authors: Maarten Grachten

Abstract: An increasing number of generative music models can be conditioned on an audio prompt that serves as musical context for which the model is to create an accompaniment (often further specified using a text prompt). Evaluation of how well model outputs adhere to the audio prompt is often done in a model or problem specific manner, presumably because no generic evaluation method for audio prompt ad… ▽ More An increasing number of generative music models can be conditioned on an audio prompt that serves as musical context for which the model is to create an accompaniment (often further specified using a text prompt). Evaluation of how well model outputs adhere to the audio prompt is often done in a model or problem specific manner, presumably because no generic evaluation method for audio prompt adherence has emerged. Such a method could be useful both in the development and training of new models, and to make performance comparable across models. In this paper we investigate whether commonly used distribution-based distances like Fréchet Audio Distance (FAD), can be used to measure audio prompt adherence. We propose a simple procedure based on a small number of constituents (an embedding model, a projection, an embedding distance, and a data fusion method), that we systematically assess using a baseline validation. In a follow-up experiment we test the sensitivity of the proposed audio adherence measure to pitch and time shift perturbations. The results show that the proposed measure is sensitive to such perturbations, even when the reference and candidate distributions are from different music collections. Although more experimentation is needed to answer unaddressed questions like the robustness of the measure to acoustic artifacts that do not affect the audio prompt adherence, the current results suggest that distribution-based embedding distances provide a viable way of measuring audio prompt adherence. An python/pytorch implementation of the proposed measure is publicly available as a github repository. △ Less

Submitted 28 December, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

arXiv:2402.01412 [pdf, other]

Bass Accompaniment Generation via Latent Diffusion

Authors: Marco Pasini, Maarten Grachten, Stefan Lattner

Abstract: The ability to automatically generate music that appropriately matches an arbitrary input track is a challenging task. We present a novel controllable system for generating single stems to accompany musical mixes of arbitrary length. At the core of our method are audio autoencoders that efficiently compress audio waveform samples into invertible latent representations, and a conditional latent dif… ▽ More The ability to automatically generate music that appropriately matches an arbitrary input track is a challenging task. We present a novel controllable system for generating single stems to accompany musical mixes of arbitrary length. At the core of our method are audio autoencoders that efficiently compress audio waveform samples into invertible latent representations, and a conditional latent diffusion model that takes as input the latent encoding of a mix and generates the latent encoding of a corresponding stem. To provide control over the timbre of generated samples, we introduce a technique to ground the latent space to a user-provided reference style during diffusion sampling. For further improving audio quality, we adapt classifier-free guidance to avoid distortions at high guidance strengths when generating an unbounded latent space. We train our model on a dataset of pairs of mixes and matching bass stems. Quantitative experiments demonstrate that, given an input mix, the proposed system can generate basslines with user-specified timbres. Our controllable conditional audio generation framework represents a significant step forward in creating generative AI tools to assist musicians in music production. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: ICASSP 2024

arXiv:2208.08968 [pdf, other]

"Melatonin": A Case Study on AI-induced Musical Style

Authors: Emmanuel Deruty, Maarten Grachten

Abstract: Although the use of AI tools in music composition and production is steadily increasing, as witnessed by the newly founded AI song contest, analysis of music produced using these tools is still relatively uncommon as a mean to gain insight in the ways AI tools impact music production. In this paper we present a case study of "Melatonin", a song produced by extensive use of BassNet, an AI tool orig… ▽ More Although the use of AI tools in music composition and production is steadily increasing, as witnessed by the newly founded AI song contest, analysis of music produced using these tools is still relatively uncommon as a mean to gain insight in the ways AI tools impact music production. In this paper we present a case study of "Melatonin", a song produced by extensive use of BassNet, an AI tool originally designed to generate bass lines. Through analysis of the artists' work flow and song project, we identify style characteristics of the song in relation to the affordances of the tool, highlighting manifestations of style in terms of both idiom and sound. △ Less

Submitted 18 August, 2022; originally announced August 2022.

Comments: Accepted paper at the 3rd Conference on AI Music Creativity (September 2022)

arXiv:2206.01104 [pdf, other]

The match file format: Encoding Alignments between Scores and Performances

Authors: Francesco Foscarin, Emmanouil Karystinaios, Silvan David Peter, Carlos Cancino-Chacón, Maarten Grachten, Gerhard Widmer

Abstract: This paper presents the specifications of match: a file format that extends a MIDI human performance with note-, beat-, and downbeat-level alignments to a corresponding musical score. This enables advanced analyses of the performance that are relevant for various tasks, such as expressive performance modeling, score following, music transcription, and performer classification. The match file inclu… ▽ More This paper presents the specifications of match: a file format that extends a MIDI human performance with note-, beat-, and downbeat-level alignments to a corresponding musical score. This enables advanced analyses of the performance that are relevant for various tasks, such as expressive performance modeling, score following, music transcription, and performer classification. The match file includes a set of score-related descriptors that makes it usable also as a bare-bones score representation. For applications that require the use of structural score elements (e.g., voices, parts, beams, slurs), the match file can be easily combined with the symbolic score. To support the practical application of our work, we release a corrected and upgraded version of the Vienna4x22 dataset of scores and performances aligned with match files. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Journal ref: Proceedings of the Music Encoding Conference (MEC), 2022, Halifax, Canada

arXiv:2206.01071 [pdf, other]

Partitura: A Python Package for Symbolic Music Processing

Authors: Carlos Cancino-Chacón, Silvan David Peter, Emmanouil Karystinaios, Francesco Foscarin, Maarten Grachten, Gerhard Widmer

Abstract: Partitura is a lightweight Python package for handling symbolic musical information. It provides easy access to features commonly used in music information retrieval tasks, like note arrays (lists of timed pitched events) and 2D piano roll matrices, as well as other score elements such as time and key signatures, performance directives, and repeat structures. Partitura can load musical scores (in… ▽ More Partitura is a lightweight Python package for handling symbolic musical information. It provides easy access to features commonly used in music information retrieval tasks, like note arrays (lists of timed pitched events) and 2D piano roll matrices, as well as other score elements such as time and key signatures, performance directives, and repeat structures. Partitura can load musical scores (in MEI, MusicXML, Kern, and MIDI formats), MIDI performances, and score-to-performance alignments. The package includes some tools for music analysis, such as automatic pitch spelling, key signature identification, and voice separation. Partitura is an open-source project and is available at https://github.com/CPJKU/partitura/. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Journal ref: Proceedings of the Music Encoding Conference (MEC), 2022, Halifax, Canada

arXiv:2201.13144 [pdf, other]

partitura: A Python Package for Handling Symbolic Musical Data

Authors: Maarten Grachten, Carlos Cancino-Chacón, Thassilo Gadermaier

Abstract: This demo paper introduces partitura, a Python package for handling symbolic musical information. The principal aim of this package is to handle richly structured musical information as conveyed by modern staff music notation. It provides a much wider range of possibilities to deal with music than the more reductive (but very common) piano roll-oriented approach inspired by the MIDI standard. The… ▽ More This demo paper introduces partitura, a Python package for handling symbolic musical information. The principal aim of this package is to handle richly structured musical information as conveyed by modern staff music notation. It provides a much wider range of possibilities to deal with music than the more reductive (but very common) piano roll-oriented approach inspired by the MIDI standard. The package is an open source project and is available on GitHub. △ Less

Submitted 31 January, 2022; originally announced January 2022.

Comments: This preprint is a slightly updated and reformatted version of the work presented at the Late Breaking/Demo Session of the 20th International Society for Music Information Retrieval Conference (ISMIR 2019), Delft, The Netherlands

arXiv:1908.00948 [pdf, other]

High-Level Control of Drum Track Generation Using Learned Patterns of Rhythmic Interaction

Authors: Stefan Lattner, Maarten Grachten

Abstract: Spurred by the potential of deep learning, computational music generation has gained renewed academic interest. A crucial issue in music generation is that of user control, especially in scenarios where the music generation process is conditioned on existing musical material. Here we propose a model for conditional kick drum track generation that takes existing musical material as input, in additi… ▽ More Spurred by the potential of deep learning, computational music generation has gained renewed academic interest. A crucial issue in music generation is that of user control, especially in scenarios where the music generation process is conditioned on existing musical material. Here we propose a model for conditional kick drum track generation that takes existing musical material as input, in addition to a low-dimensional code that encodes the desired relation between the existing material and the new material to be generated. These relational codes are learned in an unsupervised manner from a music dataset. We show that codes can be sampled to create a variety of musically plausible kick drum tracks and that the model can be used to transfer kick drum patterns from one song to another. Lastly, we demonstrate that the learned codes are largely invariant to tempo and time-shift. △ Less

Submitted 2 August, 2019; originally announced August 2019.

Comments: Paper accepted at the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2019), New Paltz, New York, U.S.A., October 20-23; 6 pages, 3 figures, 1 table

arXiv:1807.08636 [pdf, other]

Auto-adaptive Resonance Equalization using Dilated Residual Networks

Authors: Maarten Grachten, Emmanuel Deruty, Alexandre Tanguy

Abstract: In music and audio production, attenuation of spectral resonances is an important step towards a technically correct result. In this paper we present a two-component system to automate the task of resonance equalization. The first component is a dynamic equalizer that automatically detects resonances and offers to attenuate them by a user-specified factor. The second component is a deep neural net… ▽ More In music and audio production, attenuation of spectral resonances is an important step towards a technically correct result. In this paper we present a two-component system to automate the task of resonance equalization. The first component is a dynamic equalizer that automatically detects resonances and offers to attenuate them by a user-specified factor. The second component is a deep neural network that predicts the optimal attenuation factor based on the windowed audio. The network is trained and validated on empirical data gathered from an experiment in which sound engineers choose their preferred attenuation factors for a set of tracks. We test two distinct network architectures for the predictive model and find that a dilated residual network operating directly on the audio signal is on a par with a network architecture that requires a prior audio feature extraction stage. Both architectures predict human-preferred resonance attenuation factors significantly better than a baseline approach. △ Less

Submitted 23 July, 2018; originally announced July 2018.

Journal ref: Proceedings of the 20th ISMIR Conference, Delft, Netherlands, November 4-8, 2019. Pp. 405-411. https://archives.ismir.net/ismir2019/paper/000048.pdf

arXiv:1807.01080 [pdf, other]

A Computational Study of the Role of Tonal Tension in Expressive Piano Performance

Authors: Carlos Cancino-Chacón, Maarten Grachten

Abstract: Expressive variations of tempo and dynamics are an important aspect of music performances, involving a variety of underlying factors. Previous work has showed a relation between such expressive variations (in particular expressive tempo) and perceptual characteristics derived from the musical score, such as musical expectations, and perceived tension. In this work we use a computational approach t… ▽ More Expressive variations of tempo and dynamics are an important aspect of music performances, involving a variety of underlying factors. Previous work has showed a relation between such expressive variations (in particular expressive tempo) and perceptual characteristics derived from the musical score, such as musical expectations, and perceived tension. In this work we use a computational approach to study the role of three measures of tonal tension proposed by Herremans and Chew (2016) in the prediction of expressive performances of classical piano music. These features capture tonal relationships of the music represented in Chew's spiral array model, a three dimensional representation of pitch classes, chords and keys constructed in such a way that spatial proximity represents close tonal relationships. We use non-linear sequential models (recurrent neural networks) to assess the contribution of these features to the prediction of expressive dynamics and expressive tempo using a dataset of Mozart piano sonatas performed by a professional concert pianist. Experiments of models trained with and without tonal tension features show that tonal tension helps predict change of tempo and dynamics more than absolute tempo and dynamics values. Furthermore, the improvement is stronger for dynamics than for tempo. △ Less

Submitted 3 July, 2018; originally announced July 2018.

Comments: 6 pages, 2 figures, accepted as poster at the ICMPC15/ESCOM10 in Graz, Austria

arXiv:1806.08686 [pdf, other]

A Predictive Model for Music Based on Learned Interval Representations

Authors: Stefan Lattner, Maarten Grachten, Gerhard Widmer

Abstract: Connectionist sequence models (e.g., RNNs) applied to musical sequences suffer from two known problems: First, they have strictly "absolute pitch perception". Therefore, they fail to generalize over musical concepts which are commonly perceived in terms of relative distances between pitches (e.g., melodies, scale types, modes, cadences, or chord types). Second, they fall short of capturing the con… ▽ More Connectionist sequence models (e.g., RNNs) applied to musical sequences suffer from two known problems: First, they have strictly "absolute pitch perception". Therefore, they fail to generalize over musical concepts which are commonly perceived in terms of relative distances between pitches (e.g., melodies, scale types, modes, cadences, or chord types). Second, they fall short of capturing the concepts of repetition and musical form. In this paper we introduce the recurrent gated autoencoder (RGAE), a recurrent neural network which learns and operates on interval representations of musical sequences. The relative pitch modeling increases generalization and reduces sparsity in the input data. Furthermore, it can learn sequences of copy-and-shift operations (i.e. chromatically transposed copies of musical fragments)---a promising capability for learning musical repetition structure. We show that the RGAE improves the state of the art for general connectionist sequence models in learning to predict monophonic melodies, and that ensembles of relative and absolute music processing models improve the results appreciably. Furthermore, we show that the relative pitch processing of the RGAE naturally facilitates the learning and the generation of sequences of copy-and-shift operations, wherefore the RGAE greatly outperforms a common absolute pitch recurrent neural network on this task. △ Less

Submitted 22 June, 2018; originally announced June 2018.

Comments: Paper accepted at the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27; 8 pages, 3 figures

arXiv:1806.08236 [pdf, other]

Learning Transposition-Invariant Interval Features from Symbolic Music and Audio

Authors: Stefan Lattner, Maarten Grachten, Gerhard Widmer

Abstract: Many music theoretical constructs (such as scale types, modes, cadences, and chord types) are defined in terms of pitch intervals---relative distances between pitches. Therefore, when computer models are employed in music tasks, it can be useful to operate on interval representations rather than on the raw musical surface. Moreover, interval representations are transposition-invariant, valuable fo… ▽ More Many music theoretical constructs (such as scale types, modes, cadences, and chord types) are defined in terms of pitch intervals---relative distances between pitches. Therefore, when computer models are employed in music tasks, it can be useful to operate on interval representations rather than on the raw musical surface. Moreover, interval representations are transposition-invariant, valuable for tasks like audio alignment, cover song detection and music structure analysis. We employ a gated autoencoder to learn fixed-length, invertible and transposition-invariant interval representations from polyphonic music in the symbolic domain and in audio. An unsupervised training method is proposed yielding an organization of intervals in the representation space which is musically plausible. Based on the representations, a transposition-invariant self-similarity matrix is constructed and used to determine repeated sections in symbolic music and in audio, yielding competitive results in the MIREX task "Discovery of Repeated Themes and Sections". △ Less

Submitted 4 February, 2019; v1 submitted 21 June, 2018; originally announced June 2018.

Comments: Paper accepted at the 19th International Society for Music Information Retrieval Conference, ISMIR 2018, Paris, France, September 23-27; 8 pages, 5 figures

arXiv:1711.02427 [pdf, other]

The ACCompanion v0.1: An Expressive Accompaniment System

Authors: Carlos Cancino-Chacón, Martin Bonev, Amaury Durand, Maarten Grachten, Andreas Arzt, Laura Bishop, Werner Goebl, Gerhard Widmer

Abstract: In this paper we present a preliminary version of the ACCompanion, an expressive accompaniment system for MIDI input. The system uses a probabilistic monophonic score follower to track the position of the soloist in the score, and a linear Gaussian model to compute tempo updates. The expressiveness of the system is powered by the Basis-Mixer, a state-of-the-art computational model of expressive mu… ▽ More In this paper we present a preliminary version of the ACCompanion, an expressive accompaniment system for MIDI input. The system uses a probabilistic monophonic score follower to track the position of the soloist in the score, and a linear Gaussian model to compute tempo updates. The expressiveness of the system is powered by the Basis-Mixer, a state-of-the-art computational model of expressive music performance. The system allows for expressive dynamics, timing and articulation. △ Less

Submitted 7 November, 2017; originally announced November 2017.

Comments: Presented at the Late-Breaking Demo Session of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017), Suzhou, China, 2017

arXiv:1711.01634 [pdf, other]

Strategies for Conceptual Change in Convolutional Neural Networks

Authors: Maarten Grachten, Carlos Eduardo Cancino Chacón

Abstract: A remarkable feature of human beings is their capacity for creative behaviour, referring to their ability to react to problems in ways that are novel, surprising, and useful. Transformational creativity is a form of creativity where the creative behaviour is induced by a transformation of the actor's conceptual space, that is, the representational system with which the actor interprets its environ… ▽ More A remarkable feature of human beings is their capacity for creative behaviour, referring to their ability to react to problems in ways that are novel, surprising, and useful. Transformational creativity is a form of creativity where the creative behaviour is induced by a transformation of the actor's conceptual space, that is, the representational system with which the actor interprets its environment. In this report, we focus on ways of adapting systems of learned representations as they switch from performing one task to performing another. We describe an experimental comparison of multiple strategies for adaptation of learned features, and evaluate how effectively each of these strategies realizes the adaptation, in terms of the amount of training, and in terms of their ability to cope with restricted availability of training data. We show, among other things, that across handwritten digits, natural images, and classical music, adaptive strategies are systematically more effective than a baseline method that starts learning from scratch. △ Less

Submitted 25 June, 2019; v1 submitted 5 November, 2017; originally announced November 2017.

arXiv:1709.03629 [pdf, other]

What were you expecting? Using Expectancy Features to Predict Expressive Performances of Classical Piano Music

Authors: Carlos Cancino-Chacón, Maarten Grachten, David R. W. Sears, Gerhard Widmer

Abstract: In this paper we present preliminary work examining the relationship between the formation of expectations and the realization of musical performances, paying particular attention to expressive tempo and dynamics. To compute features that reflect what a listener is expecting to hear, we employ a computational model of auditory expectation called the Information Dynamics of Music model (IDyOM). We… ▽ More In this paper we present preliminary work examining the relationship between the formation of expectations and the realization of musical performances, paying particular attention to expressive tempo and dynamics. To compute features that reflect what a listener is expecting to hear, we employ a computational model of auditory expectation called the Information Dynamics of Music model (IDyOM). We then explore how well these expectancy features -- when combined with score descriptors using the Basis-Function modeling approach -- can predict expressive tempo and dynamics in a dataset of Mozart piano sonata performances. Our results suggest that using expectancy features significantly improves the predictions for tempo. △ Less

Submitted 11 September, 2017; originally announced September 2017.

Comments: 6 pages, 1 figure, 10th International Workshop on Machine Learning and Music (MML 2017)

arXiv:1708.05325 [pdf, other]

Learning Musical Relations using Gated Autoencoders

Authors: Stefan Lattner, Maarten Grachten, Gerhard Widmer

Abstract: Music is usually highly structured and it is still an open question how to design models which can successfully learn to recognize and represent musical structure. A fundamental problem is that structurally related patterns can have very distinct appearances, because the structural relationships are often based on transformations of musical material, like chromatic or diatonic transposition, inver… ▽ More Music is usually highly structured and it is still an open question how to design models which can successfully learn to recognize and represent musical structure. A fundamental problem is that structurally related patterns can have very distinct appearances, because the structural relationships are often based on transformations of musical material, like chromatic or diatonic transposition, inversion, retrograde, or rhythm change. In this preliminary work, we study the potential of two unsupervised learning techniques - Restricted Boltzmann Machines (RBMs) and Gated Autoencoders (GAEs) - to capture pre-defined transformations from constructed data pairs. We evaluate the models by using the learned representations as inputs in a discriminative task where for a given type of transformation (e.g. diatonic transposition), the specific relation between two musical patterns must be recognized (e.g. an upward transposition of diatonic steps). Furthermore, we measure the reconstruction error of models when reconstructing musical transformed patterns. Lastly, we test the models in an analogy-making task. We find that it is difficult to learn musical transformations with the RBM and that the GAE is much more adequate for this task, since it is able to learn representations of specific transformations that are largely content-invariant. We believe these results show that models such as GAEs may provide the basis for more encompassing music analysis systems, by endowing them with a better understanding of the structures underlying music. △ Less

Submitted 17 August, 2017; originally announced August 2017.

Comments: In Proceedings of the 2nd Conference on Computer Simulation of Musical Creativity (CSMC 2017)

arXiv:1707.06231 [pdf, other]

From Bach to the Beatles: The simulation of human tonal expectation using ecologically-trained predictive models

Authors: Carlos Cancino-Chacón, Maarten Grachten, Kat Agres

Abstract: Tonal structure is in part conveyed by statistical regularities between musical events, and research has shown that computational models reflect tonal structure in music by capturing these regularities in schematic constructs like pitch histograms. Of the few studies that model the acquisition of perceptual learning from musical data, most have employed self-organizing models that learn a topology… ▽ More Tonal structure is in part conveyed by statistical regularities between musical events, and research has shown that computational models reflect tonal structure in music by capturing these regularities in schematic constructs like pitch histograms. Of the few studies that model the acquisition of perceptual learning from musical data, most have employed self-organizing models that learn a topology of static descriptions of musical contexts. Also, the stimuli used to train these models are often symbolic rather than acoustically faithful representations of musical material. In this work we investigate whether sequential predictive models of musical memory (specifically, recurrent neural networks), trained on audio from commercial CD recordings, induce tonal knowledge in a similar manner to listeners (as shown in behavioral studies in music perception). Our experiments indicate that various types of recurrent neural networks produce musical expectations that clearly convey tonal structure. Furthermore, the results imply that although implicit knowledge of tonal structure is a necessary condition for accurate musical expectation, the most accurate predictive models also use other cues beyond the tonal structure of the musical context. △ Less

Submitted 19 July, 2017; originally announced July 2017.

Comments: In Proceedings of the 18th International Society of Music Information Retrieval Conference (ISMIR 2017)

arXiv:1707.01357 [pdf, other]

Improving Content-Invariance in Gated Autoencoders for 2D and 3D Object Rotation

Authors: Stefan Lattner, Maarten Grachten

Abstract: Content-invariance in mapping codes learned by GAEs is a useful feature for various relation learning tasks. In this paper we show that the content-invariance of mapping codes for images of 2D and 3D rotated objects can be substantially improved by extending the standard GAE loss (symmetric reconstruction error) with a regularization term that penalizes the symmetric cross-reconstruction error. Th… ▽ More Content-invariance in mapping codes learned by GAEs is a useful feature for various relation learning tasks. In this paper we show that the content-invariance of mapping codes for images of 2D and 3D rotated objects can be substantially improved by extending the standard GAE loss (symmetric reconstruction error) with a regularization term that penalizes the symmetric cross-reconstruction error. This error term involves reconstruction of pairs with mapping codes obtained from other pairs exhibiting similar transformations. Although this would principally require knowledge of the transformations exhibited by training pairs, our experiments show that a bootstrapping approach can sidestep this issue, and that the regularization term can effectively be used in an unsupervised setting. △ Less

Submitted 5 July, 2017; originally announced July 2017.

Comments: 10 pages

arXiv:1612.05432 [pdf, other]

Basis-Function Modeling of Loudness Variations in Ensemble Performance

Authors: Thassilo Gadermaier, Maarten Grachten, Carlos Eduardo Cancino Chacón

Abstract: This paper describes a computational model of loudness variations in expressive ensemble performance. The model predicts and explains the continuous variation of loudness as a function of information extracted automatically from the written score. Although such models have been proposed for expressive performance in solo instruments, this is (to the best of our knowledge) the first attempt to defi… ▽ More This paper describes a computational model of loudness variations in expressive ensemble performance. The model predicts and explains the continuous variation of loudness as a function of information extracted automatically from the written score. Although such models have been proposed for expressive performance in solo instruments, this is (to the best of our knowledge) the first attempt to define a model for expressive performance in ensembles. To that end, we extend an existing model that was designed to model expressive piano performances, and describe the additional steps necessary for the model to deal with scores of arbitrary instrumentation, including orchestral scores. We test both linear and non-linear variants of the extended model n a data set of audio recordings of symphonic music, in a leave-one-out setting. The experiments reveal that the most successful model variant is a recurrent, non-linear model. Even if the accuracy of the predicted loudness varies from one recording to another, in several cases the model explains well over 50% of the variance in loudness. △ Less

Submitted 16 December, 2016; originally announced December 2016.

Comments: 18 pages, 3 figures, 2 tables. Originally in 2nd International Conference on New Music Concepts (ICNMC 2016), Treviso, Italy. This version may have a different layout

arXiv:1612.04742 [pdf, other]

doi 10.5920/jcms.2018.01

Imposing higher-level Structure in Polyphonic Music Generation using Convolutional Restricted Boltzmann Machines and Constraints

Authors: Stefan Lattner, Maarten Grachten, Gerhard Widmer

Abstract: We introduce a method for imposing higher-level structure on generated, polyphonic music. A Convolutional Restricted Boltzmann Machine (C-RBM) as a generative model is combined with gradient descent constraint optimisation to provide further control over the generation process. Among other things, this allows for the use of a "template" piece, from which some structural properties can be extracted… ▽ More We introduce a method for imposing higher-level structure on generated, polyphonic music. A Convolutional Restricted Boltzmann Machine (C-RBM) as a generative model is combined with gradient descent constraint optimisation to provide further control over the generation process. Among other things, this allows for the use of a "template" piece, from which some structural properties can be extracted, and transferred as constraints to the newly generated material. The sampling process is guided with Simulated Annealing to avoid local optima, and to find solutions that both satisfy the constraints, and are relatively stable with respect to the C-RBM. Results show that with this approach it is possible to control the higher-level self-similarity structure, the meter, and the tonal properties of the resulting musical piece, while preserving its local musical coherence. △ Less

Submitted 14 April, 2018; v1 submitted 14 December, 2016; originally announced December 2016.

Comments: 31 pages, 11 figures

Journal ref: Journal of Creative Music Systems, Volume 2, Issue 1, March 2018

arXiv:1612.02198 [pdf, other]

Towards computer-assisted understanding of dynamics in symphonic music

Authors: Maarten Grachten, Carlos Eduardo Cancino-Chacón, Thassilo Gadermaier, Gerhard Widmer

Abstract: Many people enjoy classical symphonic music. Its diverse instrumentation makes for a rich listening experience. This diversity adds to the conductor's expressive freedom to shape the sound according to their imagination. As a result, the same piece may sound quite differently from one conductor to another. Differences in interpretation may be noticeable subjectively to listeners, but they are some… ▽ More Many people enjoy classical symphonic music. Its diverse instrumentation makes for a rich listening experience. This diversity adds to the conductor's expressive freedom to shape the sound according to their imagination. As a result, the same piece may sound quite differently from one conductor to another. Differences in interpretation may be noticeable subjectively to listeners, but they are sometimes hard to pinpoint, presumably because of the acoustic complexity of the sound. We describe a computational model that interprets dynamics---expressive loudness variations in performances---in terms of the musical score, highlighting differences between performances of the same piece. We demonstrate experimentally that the model has predictive power, and give examples of conductor ideosyncrasies found by using the model as an explanatory tool. Although the present model is still in active development, it may pave the road for a consumer-oriented companion to interactive classical music understanding. △ Less

Submitted 13 December, 2016; v1 submitted 7 December, 2016; originally announced December 2016.

Showing 1–24 of 24 results for author: Grachten, M