Search | arXiv e-print repository

Towards Training Music Taggers on Synthetic Data

Authors: Nadine Kroher, Steven Manangu, Aggelos Pikrakis

Abstract: Most contemporary music tagging systems rely on large volumes of annotated data. As an alternative, we investigate the extent to which synthetically generated music excerpts can improve tagging systems when only small annotated collections are available. To this end, we release GTZAN-synth, a synthetic dataset that follows the taxonomy of the well-known GTZAN dataset while being ten times larger i… ▽ More Most contemporary music tagging systems rely on large volumes of annotated data. As an alternative, we investigate the extent to which synthetically generated music excerpts can improve tagging systems when only small annotated collections are available. To this end, we release GTZAN-synth, a synthetic dataset that follows the taxonomy of the well-known GTZAN dataset while being ten times larger in data volume. We first observe that simply adding this synthetic dataset to the training split of GTZAN does not result into performance improvements. We then proceed to investigating domain adaptation, transfer learning and fine-tuning strategies for the task at hand and draw the conclusion that the last two options yield an increase in accuracy. Overall, the proposed approach can be considered as a first guide in a promising field for future research. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 6 pages, 3 figures, accepted to 21st International Conference on Content-based Multimedia Indexing (CBMI) 2024, code available https://github.com/NadineKroher/music-tagging-synthetic-data-cbmi-2024

ACM Class: I.2

arXiv:2312.07594 [pdf]

On the Prediction of Hardware Security Properties of HLS Designs Using Graph Neural Networks

Authors: Amalia Artemis Koufopoulou, Athanasios Papadimitriou, Aggelos Pikrakis, Mihalis Psarakis, David Hely

Abstract: High-level synthesis (HLS) tools have provided significant productivity enhancements to the design flow of digital systems in recent years, resulting in highly-optimized circuits, in terms of area and latency. Given the evolution of hardware attacks, which can render them vulnerable, it is essential to consider security as a significant aspect of the HLS design flow. Yet the need to evaluate a hug… ▽ More High-level synthesis (HLS) tools have provided significant productivity enhancements to the design flow of digital systems in recent years, resulting in highly-optimized circuits, in terms of area and latency. Given the evolution of hardware attacks, which can render them vulnerable, it is essential to consider security as a significant aspect of the HLS design flow. Yet the need to evaluate a huge number of functionally equivalent de-signs of the HLS design space challenges hardware security evaluation methods (e.g., fault injection - FI campaigns). In this work, we propose an evaluation methodology of hardware security properties of HLS-produced designs using state-of-the-art Graph Neural Network (GNN) approaches that achieves significant speedup and better scalability than typical evaluation methods (such as FI). We demonstrate the proposed methodology on a Double Modular Redundancy (DMR) coun-termeasure applied on an AES SBox implementation, en-hanced by diversifying the redundant modules through HLS directives. The experimental results show that GNNs can be efficiently trained to predict important hardware security met-rics concerning fault attacks (e.g., critical and detection error rates), by using regression. The proposed method predicts the fault vulnerability metrics of the HLS-based designs with high R-squared scores and achieves huge speedup compared to fault injection once the training of the GNN is completed. △ Less

Submitted 11 December, 2023; originally announced December 2023.

Comments: 6 pages, 2 figures, 3 tables, submitted to 2023 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)

arXiv:2311.09094 [pdf, other]

Can MusicGen Create Training Data for MIR Tasks?

Authors: Nadine Kroher, Helena Cuesta, Aggelos Pikrakis

Abstract: We are investigating the broader concept of using AI-based generative music systems to generate training data for Music Information Retrieval (MIR) tasks. To kick off this line of work, we ran an initial experiment in which we trained a genre classifier on a fully artificial music dataset created with MusicGen. We constructed over 50 000 genre- conditioned textual descriptions and generated a coll… ▽ More We are investigating the broader concept of using AI-based generative music systems to generate training data for Music Information Retrieval (MIR) tasks. To kick off this line of work, we ran an initial experiment in which we trained a genre classifier on a fully artificial music dataset created with MusicGen. We constructed over 50 000 genre- conditioned textual descriptions and generated a collection of music excerpts that covers five musical genres. Our preliminary results show that the proposed model can learn genre-specific characteristics from artificial music tracks that generalise well to real-world music recordings. △ Less

Submitted 15 November, 2023; originally announced November 2023.

Comments: This is an extended abstract presented at the Late-Breaking / Demo Session of the International Society for Music Information Retrieval Conference (ISMIR) 2023 (Milan, Italy)

arXiv:2208.09201 [pdf, other]

doi 10.1109/ACCESS.2022.3197907

Improving Post-Processing of Audio Event Detectors Using Reinforcement Learning

Authors: Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis

Abstract: We apply post-processing to the class probability distribution outputs of audio event classification models and employ reinforcement learning to jointly discover the optimal parameters for various stages of a post-processing stack, such as the classification thresholds and the kernel sizes of median filtering algorithms used to smooth out model predictions. To achieve this we define a reinforcemen… ▽ More We apply post-processing to the class probability distribution outputs of audio event classification models and employ reinforcement learning to jointly discover the optimal parameters for various stages of a post-processing stack, such as the classification thresholds and the kernel sizes of median filtering algorithms used to smooth out model predictions. To achieve this we define a reinforcement learning environment where: 1) a state is the class probability distribution provided by the model for a given audio sample, 2) an action is the choice of a candidate optimal value for each parameter of the post-processing stack, 3) the reward is based on the classification accuracy metric we aim to optimize, which is the audio event-based macro F1-score in our case. We apply our post-processing to the class probability distribution outputs of two audio event classification models submitted to the DCASE Task4 2020 challenge. We find that by using reinforcement learning to discover the optimal per-class parameters for the post-processing stack that is applied to the outputs of audio event classification models, we can improve the audio event-based macro F1-score (the main metric used in the DCASE challenge to compare audio event classification accuracy) by 4-5% compared to using the same post-processing stack with manually tuned parameters. △ Less

Submitted 19 August, 2022; originally announced August 2022.

Comments: Published on IEEE Access journal, Volume 10, 2022

arXiv:2110.12778 [pdf, other]

A Deep Reinforcement Learning Approach for Audio-based Navigation and Audio Source Localization in Multi-speaker Environments

Authors: Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis

Abstract: In this work we apply deep reinforcement learning to the problems of navigating a three-dimensional environment and inferring the locations of human speaker audio sources within, in the case where the only available information is the raw sound from the environment, as a simulated human listener placed in the environment would hear it. For this purpose we create two virtual environments using the… ▽ More In this work we apply deep reinforcement learning to the problems of navigating a three-dimensional environment and inferring the locations of human speaker audio sources within, in the case where the only available information is the raw sound from the environment, as a simulated human listener placed in the environment would hear it. For this purpose we create two virtual environments using the Unity game engine, one presenting an audio-based navigation problem and one presenting an audio source localization problem. We also create an autonomous agent based on PPO online reinforcement learning algorithm and attempt to train it to solve these environments. Our experiments show that our agent achieves adequate performance and generalization ability in both environments, measured by quantitative metrics, even when a limited amount of training data are available or the environment parameters shift in ways not encountered during training. We also show that a degree of agent knowledge transfer is possible between the environments. △ Less

Submitted 27 November, 2021; v1 submitted 25 October, 2021; originally announced October 2021.

Comments: arXiv admin note: text overlap with arXiv:2105.04488

arXiv:2105.04488 [pdf, other]

doi 10.1109/ICASSP39728.2021.9415013

A Deep Reinforcement Learning Approach to Audio-Based Navigation in a Multi-Speaker Environment

Authors: Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis

Abstract: In this work we use deep reinforcement learning to create an autonomous agent that can navigate in a two-dimensional space using only raw auditory sensory information from the environment, a problem that has received very little attention in the reinforcement learning literature. Our experiments show that the agent can successfully identify a particular target speaker among a set of $N$ predefined… ▽ More In this work we use deep reinforcement learning to create an autonomous agent that can navigate in a two-dimensional space using only raw auditory sensory information from the environment, a problem that has received very little attention in the reinforcement learning literature. Our experiments show that the agent can successfully identify a particular target speaker among a set of $N$ predefined speakers in a room and move itself towards that speaker, while avoiding collision with other speakers or going outside the room boundaries. The agent is shown to be robust to speaker pitch shifting and it can learn to navigate the environment, even when a limited number of training utterances are available for each speaker. △ Less

Submitted 10 May, 2021; originally announced May 2021.

Comments: To be published in ICASSP 2021

arXiv:2102.08870 [pdf, other]

Online Co-movement Pattern Prediction in Mobility Data

Authors: Andreas Tritsarolis, Eva Chondrodima, Panagiotis Tampakis, Aggelos Pikrakis

Abstract: Predictive analytics over mobility data are of great importance since they can assist an analyst to predict events, such as collisions, encounters, traffic jams, etc. A typical example of such analytics is future location prediction, where the goal is to predict the future location of a moving object,given a look-ahead time. What is even more challenging is being able to accurately predict collect… ▽ More Predictive analytics over mobility data are of great importance since they can assist an analyst to predict events, such as collisions, encounters, traffic jams, etc. A typical example of such analytics is future location prediction, where the goal is to predict the future location of a moving object,given a look-ahead time. What is even more challenging is being able to accurately predict collective behavioural patterns of movement, such as co-movement patterns. In this paper, we provide an accurate solution to the problem of Online Prediction of Co-movement Patterns. In more detail, we split the original problem into two sub-problems, namely Future Location Prediction and Evolving Cluster Detection. Furthermore, in order to be able to calculate the accuracy of our solution, we propose a co-movement pattern similarity measure, which facilitates us to match the predicted clusters with the actual ones. Finally, the accuracy of our solution is demonstrated experimentally over a real dataset from the maritime domain. △ Less

Submitted 17 February, 2021; originally announced February 2021.

arXiv:1807.00069 [pdf, other]

Exploratory Analysis of a Large Flamenco Corpus using an Ensemble of Convolutional Neural Networks as a Structural Annotation Backend

Authors: Nadine Kroher, Aggelos Pikrakis

Abstract: We present computational tools that we developed for the analysis of a large corpus of flamenco music recordings, along with the related exploratory findings. The proposed computational backend is based on a set of Convolutional Neural Networks that provide the structural annotation of each music recording with respect to the presence of vocals, guitar and hand-clapping ("palmas"). The resulting,… ▽ More We present computational tools that we developed for the analysis of a large corpus of flamenco music recordings, along with the related exploratory findings. The proposed computational backend is based on a set of Convolutional Neural Networks that provide the structural annotation of each music recording with respect to the presence of vocals, guitar and hand-clapping ("palmas"). The resulting, automatically extracted annotations, allowed for the visualization of music recordings in structurally meaningful ways, the extraction of global statistics related to the instrumentation of flamenco music, the detection of a cappella and instrumental recordings for which no such information existed, the investigation of differences in structure and instrumentation across styles and the study of tonality across instrumentation and styles. The reported findings show that it is feasible to perform a large scale analysis of flamenco music with state-of-the-art classification technology and produce automatically extracted descriptors that are both musicologically valid and useful, in the sense that they can enhance conventional metadata schemes and assist bridging the semantic gap between audio recordings and high-level musicological concepts. △ Less

Submitted 29 June, 2018; originally announced July 2018.

arXiv:1612.08391 [pdf, other]

Audio-based Distributional Semantic Models for Music Auto-tagging and Similarity Measurement

Authors: Giannis Karamanolakis, Elias Iosif, Athanasia Zlatintsi, Aggelos Pikrakis, Alexandros Potamianos

Abstract: The recent development of Audio-based Distributional Semantic Models (ADSMs) enables the computation of audio and lexical vector representations in a joint acoustic-semantic space. In this work, these joint representations are applied to the problem of automatic tag generation. The predicted tags together with their corresponding acoustic representation are exploited for the construction of acoust… ▽ More The recent development of Audio-based Distributional Semantic Models (ADSMs) enables the computation of audio and lexical vector representations in a joint acoustic-semantic space. In this work, these joint representations are applied to the problem of automatic tag generation. The predicted tags together with their corresponding acoustic representation are exploited for the construction of acoustic-semantic clip embeddings. The proposed algorithms are evaluated on the task of similarity measurement between music clips. Acoustic-semantic models are shown to outperform the state-of-the-art for this task and produce high quality tags for audio/music clips. △ Less

Submitted 26 December, 2016; originally announced December 2016.

Showing 1–9 of 9 results for author: Pikrakis, A