Search | arXiv e-print repository

doi 10.1109/ACCESS.2022.3197907

Improving Post-Processing of Audio Event Detectors Using Reinforcement Learning

Authors: Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis

Abstract: We apply post-processing to the class probability distribution outputs of audio event classification models and employ reinforcement learning to jointly discover the optimal parameters for various stages of a post-processing stack, such as the classification thresholds and the kernel sizes of median filtering algorithms used to smooth out model predictions. To achieve this we define a reinforcemen… ▽ More We apply post-processing to the class probability distribution outputs of audio event classification models and employ reinforcement learning to jointly discover the optimal parameters for various stages of a post-processing stack, such as the classification thresholds and the kernel sizes of median filtering algorithms used to smooth out model predictions. To achieve this we define a reinforcement learning environment where: 1) a state is the class probability distribution provided by the model for a given audio sample, 2) an action is the choice of a candidate optimal value for each parameter of the post-processing stack, 3) the reward is based on the classification accuracy metric we aim to optimize, which is the audio event-based macro F1-score in our case. We apply our post-processing to the class probability distribution outputs of two audio event classification models submitted to the DCASE Task4 2020 challenge. We find that by using reinforcement learning to discover the optimal per-class parameters for the post-processing stack that is applied to the outputs of audio event classification models, we can improve the audio event-based macro F1-score (the main metric used in the DCASE challenge to compare audio event classification accuracy) by 4-5% compared to using the same post-processing stack with manually tuned parameters. △ Less

Submitted 19 August, 2022; originally announced August 2022.

Comments: Published on IEEE Access journal, Volume 10, 2022

arXiv:2110.12778 [pdf, other]

A Deep Reinforcement Learning Approach for Audio-based Navigation and Audio Source Localization in Multi-speaker Environments

Authors: Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis

Abstract: In this work we apply deep reinforcement learning to the problems of navigating a three-dimensional environment and inferring the locations of human speaker audio sources within, in the case where the only available information is the raw sound from the environment, as a simulated human listener placed in the environment would hear it. For this purpose we create two virtual environments using the… ▽ More In this work we apply deep reinforcement learning to the problems of navigating a three-dimensional environment and inferring the locations of human speaker audio sources within, in the case where the only available information is the raw sound from the environment, as a simulated human listener placed in the environment would hear it. For this purpose we create two virtual environments using the Unity game engine, one presenting an audio-based navigation problem and one presenting an audio source localization problem. We also create an autonomous agent based on PPO online reinforcement learning algorithm and attempt to train it to solve these environments. Our experiments show that our agent achieves adequate performance and generalization ability in both environments, measured by quantitative metrics, even when a limited amount of training data are available or the environment parameters shift in ways not encountered during training. We also show that a degree of agent knowledge transfer is possible between the environments. △ Less

Submitted 27 November, 2021; v1 submitted 25 October, 2021; originally announced October 2021.

Comments: arXiv admin note: text overlap with arXiv:2105.04488

arXiv:2105.04488 [pdf, other]

doi 10.1109/ICASSP39728.2021.9415013

A Deep Reinforcement Learning Approach to Audio-Based Navigation in a Multi-Speaker Environment

Authors: Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis

Abstract: In this work we use deep reinforcement learning to create an autonomous agent that can navigate in a two-dimensional space using only raw auditory sensory information from the environment, a problem that has received very little attention in the reinforcement learning literature. Our experiments show that the agent can successfully identify a particular target speaker among a set of $N$ predefined… ▽ More In this work we use deep reinforcement learning to create an autonomous agent that can navigate in a two-dimensional space using only raw auditory sensory information from the environment, a problem that has received very little attention in the reinforcement learning literature. Our experiments show that the agent can successfully identify a particular target speaker among a set of $N$ predefined speakers in a room and move itself towards that speaker, while avoiding collision with other speakers or going outside the room boundaries. The agent is shown to be robust to speaker pitch shifting and it can learn to navigate the environment, even when a limited number of training utterances are available for each speaker. △ Less

Submitted 10 May, 2021; originally announced May 2021.

Comments: To be published in ICASSP 2021

arXiv:1807.09157 [pdf, other]

Robust Group Comparison Using Non-Parametric Block-Based Statistics

Authors: Geng Chen, Pei Zhang, Ke Li, Chong-Yaw Wee, Wenliang Pan, Yafeng Wu, Panteleimon Giannakopoulos, Sven Haller, Dinggang Shen, Pew-Thian Yap

Abstract: Voxel-based analysis methods localize brain structural differences by performing voxel-wise statistical comparisons on two groups of images aligned to a common space. This procedure requires highly accurate registration as well as a sufficiently large dataset. However, in practice, the registration algorithms are not perfect due to noise, artifacts, and complex structural variations. The sample si… ▽ More Voxel-based analysis methods localize brain structural differences by performing voxel-wise statistical comparisons on two groups of images aligned to a common space. This procedure requires highly accurate registration as well as a sufficiently large dataset. However, in practice, the registration algorithms are not perfect due to noise, artifacts, and complex structural variations. The sample size is also limited due to low disease prevalence, recruitment difficulties, and demographic matching issues. To address these issues, in this paper, we propose a method, called block-based statistic (BBS), for robust group comparison. BBS consists of two major components: Block matching and permutation test. Specifically, based on two group of images aligned to a common space, we first perform block matching so that structural misalignments can be corrected. Then, based on results given by block matching, we conduct robust non-parametric statistical inference based on permutation test. Extensive experiments were performed on synthetic data and the real diffusion MR data of mild cognitive impairment patients. The experimental results indicate that BBS significantly improves statistical power, notwithstanding the small sample size. △ Less

Submitted 24 July, 2018; originally announced July 2018.

Comments: 17 pages, 9 figures

arXiv:1802.06225 [pdf]

A Deep Q-Learning Agent for the L-Game with Variable Batch Training

Authors: Petros Giannakopoulos, Yannis Cotronis

Abstract: We employ the Deep Q-Learning algorithm with Experience Replay to train an agent capable of achieving a high-level of play in the L-Game while self-learning from low-dimensional states. We also employ variable batch size for training in order to mitigate the loss of the rare reward signal and significantly accelerate training. Despite the large action space due to the number of possible moves, the… ▽ More We employ the Deep Q-Learning algorithm with Experience Replay to train an agent capable of achieving a high-level of play in the L-Game while self-learning from low-dimensional states. We also employ variable batch size for training in order to mitigate the loss of the rare reward signal and significantly accelerate training. Despite the large action space due to the number of possible moves, the low-dimensional state space and the rarity of rewards, which only come at the end of a game, DQL is successful in training an agent capable of strong play without the use of any search methods or domain knowledge. △ Less

Submitted 17 February, 2018; originally announced February 2018.

arXiv:1702.02482 [pdf]

doi 10.1051/epjconf/201611607005

A study on implementing a multithreaded version of the SIRENE detector simulation software for high energy neutrinos

Authors: Petros Giannakopoulos, Michail Gkoumas, Ioannis Diplas, Georgios Voularinos, Theofanis Vlachos, Konstantia Balasi, Ekaterini Tzamariudaki, Christos Filippidis, Yiannis Cotronis, Christos Markou

Abstract: The primary objective of SIRENE is to simulate the response to neutrino events of any type of high energy neutrino telescope. Additionally, it implements different geometries for a neutrino detector and different configurations and characteristics of photo-multiplier tubes (PMTs) inside the optical modules of the detector through a library of C+ + classes. This could be considered a massive statis… ▽ More The primary objective of SIRENE is to simulate the response to neutrino events of any type of high energy neutrino telescope. Additionally, it implements different geometries for a neutrino detector and different configurations and characteristics of photo-multiplier tubes (PMTs) inside the optical modules of the detector through a library of C+ + classes. This could be considered a massive statistical analysis of photo-electrons. Aim of this work is the development of a multithreaded version of the SIRENE detector simulation software for high energy neutrinos. This approach allows utilization of multiple CPU cores leading to a potentially significant decrease in the required execution time compared to the sequential code. We are making use of the OpenMP framework for the production of multithreaded code running on the CPU. Finally, we analyze the feasibility of a GPU-accelerated implementation. △ Less

Submitted 8 February, 2017; originally announced February 2017.

Showing 1–6 of 6 results for author: Giannakopoulos, P