Search | arXiv e-print repository

arXiv:2505.11915 [pdf, other]

BINAQUAL: A Full-Reference Objective Localization Similarity Metric for Binaural Audio

Authors: Davoud Shariat Panah, Dan Barry, Alessandro Ragano, Jan Skoglund, Andrew Hines

Abstract: Spatial audio enhances immersion in applications such as virtual reality, augmented reality, gaming, and cinema by creating a three-dimensional auditory experience. Ensuring the spatial fidelity of binaural audio is crucial, given that processes such as compression, encoding, or transmission can alter localization cues. While subjective listening tests like MUSHRA remain the gold standard for eval… ▽ More Spatial audio enhances immersion in applications such as virtual reality, augmented reality, gaming, and cinema by creating a three-dimensional auditory experience. Ensuring the spatial fidelity of binaural audio is crucial, given that processes such as compression, encoding, or transmission can alter localization cues. While subjective listening tests like MUSHRA remain the gold standard for evaluating spatial localization quality, they are costly and time-consuming. This paper introduces BINAQUAL, a full-reference objective metric designed to assess localization similarity in binaural audio recordings. BINAQUAL adapts the AMBIQUAL metric, originally developed for localization quality assessment in ambisonics audio format to the binaural domain. We evaluate BINAQUAL across five key research questions, examining its sensitivity to variations in sound source locations, angle interpolations, surround speaker layouts, audio degradations, and content diversity. Results demonstrate that BINAQUAL effectively differentiates between subtle spatial variations and correlates strongly with subjective listening tests, making it a reliable metric for binaural localization quality assessment. The proposed metric provides a robust benchmark for ensuring spatial accuracy in binaural audio processing, paving the way for improved objective evaluations in immersive audio applications. △ Less

Submitted 17 May, 2025; originally announced May 2025.

Comments: Submitted to the Journal of Audio Engineering Society (JAES)

arXiv:2505.01369 [pdf, ps, other]

Binamix -- A Python Library for Generating Binaural Audio Datasets

Authors: Dan Barry, Davoud Shariat Panah, Alessandro Ragano, Jan Skoglund, Andrew Hines

Abstract: The increasing demand for spatial audio in applications such as virtual reality, immersive media, and spatial audio research necessitates robust solutions to generate binaural audio data sets for use in testing and validation. Binamix is an open-source Python library designed to facilitate programmatic binaural mixing using the extensive SADIE II Database, which provides Head Related Impulse Respo… ▽ More The increasing demand for spatial audio in applications such as virtual reality, immersive media, and spatial audio research necessitates robust solutions to generate binaural audio data sets for use in testing and validation. Binamix is an open-source Python library designed to facilitate programmatic binaural mixing using the extensive SADIE II Database, which provides Head Related Impulse Response (HRIR) and Binaural Room Impulse Response (BRIR) data for 20 subjects. The Binamix library provides a flexible and repeatable framework for creating large-scale spatial audio datasets, making it an invaluable resource for codec evaluation, audio quality metric development, and machine learning model training. A range of pre-built example scripts, utility functions, and visualization plots further streamline the process of custom pipeline creation. This paper presents an overview of the library's capabilities, including binaural rendering, impulse response interpolation, and multi-track mixing for various speaker layouts. The tools utilize a modified Delaunay triangulation technique to achieve accurate HRIR/BRIR interpolation where desired angles are not present in the data. By supporting a wide range of parameters such as azimuth, elevation, subject Impulse Responses (IRs), speaker layouts, mixing controls, and more, the library enables researchers to create large binaural datasets for any downstream purpose. Binamix empowers researchers and developers to advance spatial audio applications with reproducible methodologies by offering an open-source solution for binaural rendering and dataset generation. We release the library under the Apache 2.0 License at https://github.com/QxLabIreland/Binamix/ △ Less

Submitted 2 May, 2025; originally announced May 2025.

Comments: Accepted to the 158th Audio Engineering Society Convention, 2025

arXiv:2308.10637 [pdf, ps, other]

Metro Access Network with Convergence of Coherent and Analog RoF Data Services

Authors: Amol Delmade, Frank Slyne, Colm Browning, Daniel Kilper Liam Barry, Marco Ruffini

Abstract: Efficient use of spectral resources will be an important aspect of converged access network deployment. This work analyzes the performance of variable bandwidth Analog Radio-over-Fiber signals transmitted in the unfilled spectral spaces of telecom-grade ROADM channels dedicated for coherent signals transmission over the OpenIreland testbed. Efficient use of spectral resources will be an important aspect of converged access network deployment. This work analyzes the performance of variable bandwidth Analog Radio-over-Fiber signals transmitted in the unfilled spectral spaces of telecom-grade ROADM channels dedicated for coherent signals transmission over the OpenIreland testbed. △ Less

Submitted 24 October, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

arXiv:2104.06798 [pdf, ps, other]

Audio-based cough counting using independent subspace analysis

Authors: Paul Leamy, Ted Burke, Dan Barry, David Dorran

Abstract: In this paper, an algorithm designed to detect characteristic cough events in audio recordings is presented, significantly reducing the time required for manual counting. Using time-frequency representations and independent subspace analysis (ISA), sound events that exhibit characteristics of coughs are automatically detected, producing a summary of the events detected. Using a dataset created fro… ▽ More In this paper, an algorithm designed to detect characteristic cough events in audio recordings is presented, significantly reducing the time required for manual counting. Using time-frequency representations and independent subspace analysis (ISA), sound events that exhibit characteristics of coughs are automatically detected, producing a summary of the events detected. Using a dataset created from publicly available audio recordings, this algorithm has been tested on a variety of synthesized audio scenarios representative of those likely to be encountered by subjects undergoing an ambulatory cough recording, achieving a true positive rate of 76% with an average of 2.85 false positives per minute. △ Less

Submitted 14 April, 2021; originally announced April 2021.

arXiv:1910.03159 [pdf, other]

xYOLO: A Model For Real-Time Object Detection In Humanoid Soccer On Low-End Hardware

Authors: Daniel Barry, Munir Shah, Merel Keijsers, Humayun Khan, Banon Hopman

Abstract: With the emergence of onboard vision processing for areas such as the internet of things (IoT), edge computing and autonomous robots, there is increasing demand for computationally efficient convolutional neural network (CNN) models to perform real-time object detection on resource constraints hardware devices. Tiny-YOLO is generally considered as one of the faster object detectors for low-end dev… ▽ More With the emergence of onboard vision processing for areas such as the internet of things (IoT), edge computing and autonomous robots, there is increasing demand for computationally efficient convolutional neural network (CNN) models to perform real-time object detection on resource constraints hardware devices. Tiny-YOLO is generally considered as one of the faster object detectors for low-end devices and is the basis for our work. Our experiments on this network have shown that Tiny-YOLO can achieve 0.14 frames per second(FPS) on the Raspberry Pi 3 B, which is too slow for soccer playing autonomous humanoid robots detecting goal and ball objects. In this paper we propose an adaptation to the YOLO CNN model named xYOLO, that can achieve object detection at a speed of 9.66 FPS on the Raspberry Pi 3 B. This is achieved by trading an acceptable amount of accuracy, making the network approximately 70 times faster than Tiny-YOLO. Greater inference speed-ups were also achieved on a desktop CPU and GPU. Additionally we contribute an annotated Darknet dataset for goal and ball detection. △ Less

Submitted 7 October, 2019; originally announced October 2019.

Comments: 6 pages, 5 figures

Showing 1–5 of 5 results for author: Barry, D