-
Ambisonics Encoder for Wearable Array with Improved Binaural Reproduction
Authors:
Yhonatan Gayer,
Vladimir Tourbabin,
Zamir Ben-Hur,
David Alon,
Boaz Rafaely
Abstract:
Ambisonics Signal Matching (ASM) is a recently proposed signal-independent approach to encoding Ambisonic signal from wearable microphone arrays, enabling efficient and standardized spatial sound reproduction. However, reproduction accuracy is currently limited due to the non-ideal layout of the microphones. This research introduces an enhanced ASM encoder that reformulates the loss function by in…
▽ More
Ambisonics Signal Matching (ASM) is a recently proposed signal-independent approach to encoding Ambisonic signal from wearable microphone arrays, enabling efficient and standardized spatial sound reproduction. However, reproduction accuracy is currently limited due to the non-ideal layout of the microphones. This research introduces an enhanced ASM encoder that reformulates the loss function by integrating a Binaural Signal Matching (BSM) term into the optimization framework. The aim of this reformulation is to improve the accuracy of binaural reproduction when integrating the Ambisonic signal with Head-Related Transfer Functions (HRTFs), making the encoded Ambisonic signal better suited for binaural reproduction. This paper first presents the mathematical formulation developed to align the ASM and BSM objectives in a single loss function, followed by a simulation study with a simulated microphone array mounted on a rigid sphere representing a head-mounted wearable array. The analysis shows that improved binaural reproduction with the encoded Ambisonic signal can be achieved using this joint ASM-BSM optimization, thereby enabling higher-quality binaural playback for virtual and augmented reality applications based on Ambisonics.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
Loss functions incorporating auditory spatial perception in deep learning -- a review
Authors:
Boaz Rafaely,
Stefan Weinzierl,
Or Berebi,
Fabian Brinkmann
Abstract:
Binaural reproduction aims to deliver immersive spatial audio with high perceptual realism over headphones. Loss functions play a central role in optimizing and evaluating algorithms that generate binaural signals. However, traditional signal-related difference measures often fail to capture the perceptual properties that are essential to spatial audio quality. This review paper surveys recent los…
▽ More
Binaural reproduction aims to deliver immersive spatial audio with high perceptual realism over headphones. Loss functions play a central role in optimizing and evaluating algorithms that generate binaural signals. However, traditional signal-related difference measures often fail to capture the perceptual properties that are essential to spatial audio quality. This review paper surveys recent loss functions that incorporate spatial perception cues relevant to binaural reproduction. It focuses on losses applied to binaural signals, which are often derived from microphone recordings or Ambisonics signals, while excluding those based on room impulse responses. Guided by the Spatial Audio Quality Inventory (SAQI), the review emphasizes perceptual dimensions related to source localization and room response, while excluding general spectral-temporal attributes. The literature survey reveals a strong focus on localization cues, such as interaural time and level differences (ITDs, ILDs), while reverberation and other room acoustic attributes remain less explored in loss function design. Recent works that estimate room acoustic parameters and develop embeddings that capture room characteristics indicate their potential for future integration into neural network training. The paper concludes by highlighting future research directions toward more perceptually grounded loss functions that better capture the listener's spatial experience.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
BSM-iMagLS: ILD Informed Binaural Signal Matching for Reproduction with Head-Mounted Microphone Arrays
Authors:
Or Berebi,
Zamir Ben-Hur,
David Lou Alon,
Boaz Rafaely
Abstract:
Headphone listening in applications such as augmented and virtual reality (AR and VR) relies on high-quality spatial audio to ensure immersion, making accurate binaural reproduction a critical component. As capture devices, wearable arrays with only a few microphones with irregular arrangement face challenges in achieving a reproduction quality comparable to that of arrays with a large number of m…
▽ More
Headphone listening in applications such as augmented and virtual reality (AR and VR) relies on high-quality spatial audio to ensure immersion, making accurate binaural reproduction a critical component. As capture devices, wearable arrays with only a few microphones with irregular arrangement face challenges in achieving a reproduction quality comparable to that of arrays with a large number of microphones. Binaural signal matching (BSM) has recently been presented as a signal-independent approach for generating high-quality binaural signal using only a few microphones, which is further improved using magnitude-least squares (MagLS) optimization at high frequencies. This paper extends BSM with MagLS by introducing interaural level difference (ILD) into the MagLS, integrated into BSM (BSM-iMagLS). Using a deep neural network (DNN)-based solver, BSM-iMagLS achieves joint optimization of magnitude, ILD, and magnitude derivatives, improving spatial fidelity. Performance is validated through theoretical analysis, numerical simulations with diverse HRTFs and head-mounted array geometries, and listening experiments, demonstrating a substantial reduction in ILD errors while maintaining comparable magnitude accuracy to state-of-the-art solutions. The results highlight the potential of BSM-iMagLS to enhance binaural reproduction for wearable and portable devices.
△ Less
Submitted 25 June, 2025; v1 submitted 30 January, 2025;
originally announced January 2025.
-
Ambisonics Binaural Rendering via Masked Magnitude Least Squares
Authors:
Or Berebi,
Fabian Brinkmann,
Stefan Weinzierl,
Boaz Rafaely
Abstract:
Ambisonics rendering has become an integral part of 3D audio for headphones. It works well with existing recording hardware, the processing cost is mostly independent of the number of sound sources, and it elegantly allows for rotating the scene and listener. One challenge in Ambisonics headphone rendering is to find a perceptually well behaved low-order representation of the Head-Related Transfer…
▽ More
Ambisonics rendering has become an integral part of 3D audio for headphones. It works well with existing recording hardware, the processing cost is mostly independent of the number of sound sources, and it elegantly allows for rotating the scene and listener. One challenge in Ambisonics headphone rendering is to find a perceptually well behaved low-order representation of the Head-Related Transfer Functions (HRTFs) that are contained in the rendering pipe-line. Low-order rendering is of interest, when working with microphone arrays containing only a few sensors, or for reducing the bandwidth for signal transmission. Magnitude Least Squares rendering became the de facto standard for this, which discards high-frequency interaural phase information in favor of reducing magnitude errors. Building upon this idea, we suggest Masked Magnitude Least Squares, which optimized the Ambisonics coefficients with a neural network and employs a spatio-spectral weighting mask to control the accuracy of the magnitude reconstruction. In the tested case, the weighting mask helped to maintain high-frequency notches in the low-order HRTFs and improved the modeled median plane localization performance in comparison to MagLS, while only marginally affecting the overall accuracy of the magnitude reconstruction.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
The importance of spatial and spectral information in multiple speaker tracking
Authors:
Hanan Beit-On,
Vladimir Tourbabin,
Boaz Rafaely
Abstract:
Multi-speaker localization and tracking using microphone array recording is of importance in a wide range of applications. One of the challenges with multi-speaker tracking is to associate direction estimates with the correct speaker. Most existing association approaches rely on spatial or spectral information alone, leading to performance degradation when one of these information channels is part…
▽ More
Multi-speaker localization and tracking using microphone array recording is of importance in a wide range of applications. One of the challenges with multi-speaker tracking is to associate direction estimates with the correct speaker. Most existing association approaches rely on spatial or spectral information alone, leading to performance degradation when one of these information channels is partially known or missing. This paper studies a joint probability data association (JPDA)-based method that facilitates association based on joint spatial-spectral information. This is achieved by integrating speaker time-frequency (TF) masks, estimated based on spectral information, in the association probabilities calculation. An experimental study that tested the proposed method on recordings from the LOCATA challenge demonstrates the enhanced performance obtained by using joint spatial-spectral information in the association.
△ Less
Submitted 15 October, 2024;
originally announced October 2024.
-
Blind Localization of Early Room Reflections with Arbitrary Microphone Array
Authors:
Yogev Hadadi,
Vladimir Tourbabin,
Zamir Ben-Hur,
David Lou Alon,
Boaz Rafaely
Abstract:
Blindly estimating the direction of arrival (DoA) of early room reflections without prior knowledge of the room impulse response or source signal is highly valuable in audio signal processing applications. The FF-PHALCOR (Frequency Focusing PHase ALigned CORrelation) method was recently developed for this purpose, extending the original PHALCOR method to work with arbitrary arrays rather than just…
▽ More
Blindly estimating the direction of arrival (DoA) of early room reflections without prior knowledge of the room impulse response or source signal is highly valuable in audio signal processing applications. The FF-PHALCOR (Frequency Focusing PHase ALigned CORrelation) method was recently developed for this purpose, extending the original PHALCOR method to work with arbitrary arrays rather than just spherical ones. Previous studies have provided only initial insights into its performance. This study offers a comprehensive analysis of the method's performance and limitations, examining how reflection characteristics such as delay, amplitude, and spatial density affect its effectiveness. The research also proposes improvements to overcome these limitations, enhancing detection quality and reducing false alarms. Additionally, the study examined how spatial perception is affected by generating room impulse responses using estimated reflection information. The findings suggest a perceptual advantage of the proposed approach over the baseline, with particularly high perceptual quality when using the spherical array with 32 microphones. However, the quality is somewhat reduced when using a semi-circular array with only 6 microphones.
△ Less
Submitted 23 September, 2024;
originally announced September 2024.
-
Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting
Authors:
Daniel A. Mitchell,
Boaz Rafaely,
Anurag Kumar,
Vladimir Tourbabin
Abstract:
Direction-of-arrival estimation of multiple speakers in a room is an important task for a wide range of applications. In particular, challenging environments with moving speakers, reverberation and noise, lead to significant performance degradation for current methods. With the aim of better understanding factors affecting performance and improving current methods, in this paper multi-speaker dire…
▽ More
Direction-of-arrival estimation of multiple speakers in a room is an important task for a wide range of applications. In particular, challenging environments with moving speakers, reverberation and noise, lead to significant performance degradation for current methods. With the aim of better understanding factors affecting performance and improving current methods, in this paper multi-speaker direction-of-arrival (DOA) estimation is investigated using a modified version of the local space domain distance (LSDD) algorithm in a noisy, dynamic and reverberant environment employing a wearable microphone array. This study utilizes the recently published EasyCom speech dataset, recorded using a wearable microphone array mounted on eyeglasses. While the original LSDD algorithm demonstrates strong performance in static environments, its efficacy significantly diminishes in the dynamic settings of the EasyCom dataset. Several enhancements to the LSDD algorithm are developed following a comprehensive performance and system analysis, which enable improved DOA estimation under these challenging conditions. These improvements include incorporating a weighted reliability approach and introducing a new quality measure that reliably identifies the more accurate DOA estimates, thereby enhancing both the robustness and accuracy of the algorithm in challenging environments.
△ Less
Submitted 22 September, 2024;
originally announced September 2024.
-
Performance and Robustness of Signal-Dependent vs. Signal-Independent Binaural Signal Matching with Wearable Microphone Arrays
Authors:
Ami Berger,
Vladimir Tourbabin,
Jacob Donley,
Zamir Ben-Hur,
Boaz Rafaely
Abstract:
The increasing popularity of spatial audio in applications such as teleconferencing, entertainment, and virtual reality has led to the recent developments of binaural reproduction methods. However, only a few of these methods are well-suited for wearable and mobile arrays, which typically consist of a small number of microphones. One such method is binaural signal matching (BSM), which has been sh…
▽ More
The increasing popularity of spatial audio in applications such as teleconferencing, entertainment, and virtual reality has led to the recent developments of binaural reproduction methods. However, only a few of these methods are well-suited for wearable and mobile arrays, which typically consist of a small number of microphones. One such method is binaural signal matching (BSM), which has been shown to produce high-quality binaural signals for wearable arrays. However, BSM may be suboptimal in cases of high direct-to-reverberant ratio (DRR) as it is based on the diffuse sound field assumption. To overcome this limitation, previous studies incorporated sound-field models other than diffuse. However, performance may be sensitive to signal estimation errors. This paper aims to provide a systematic and comprehensive analysis of signal-dependent vs. signal-independent BSM, so that the benefits and limitations of the methods become clearer. Two signal-dependent BSM-based methods designed for high DRR scenarios that incorporate a sound field model composed of direct and reverberant components are investigated mathematically, using simulations, and finally validated by a listening test, and compared to the signal-independent BSM. The results show that signal-dependent BSM can significantly improve performance, in particular in the direction of the source, while presenting only a negligible degradation in other directions. Furthermore, when source direction estimation is inaccurate, performance of of the signal-dependent BSM degrade to equal that of the signal-independent BSM, presenting a desired robustness quality.
△ Less
Submitted 14 February, 2025; v1 submitted 18 September, 2024;
originally announced September 2024.
-
Assessing the Potential Impact of Direction-Dependent HRTF Selection on Sound Localization Accuracy
Authors:
Sapir Goldring,
Zamir Ben Hur,
David Lou Alon,
Boaz Rafaely
Abstract:
This study investigates the approach of direction-dependent selection of Head-Related Transfer Functions (HRTFs) and its impact on sound localization accuracy. For applications such as virtual reality (VR) and teleconferencing, obtaining individualized HRTFs can be beneficial yet challenging, the objective of this work is therefore to assess whether incorporating HRTFs in a direction-dependent man…
▽ More
This study investigates the approach of direction-dependent selection of Head-Related Transfer Functions (HRTFs) and its impact on sound localization accuracy. For applications such as virtual reality (VR) and teleconferencing, obtaining individualized HRTFs can be beneficial yet challenging, the objective of this work is therefore to assess whether incorporating HRTFs in a direction-dependent manner could improve localization precision without the need to obtain individualized HRTFs. A localization experiment conducted with a VR headset assessed localization errors, comparing an overall best HRTF from a set, against selecting the best HRTF based on average performance in each direction. The results demonstrate a substantial improvement in elevation localization error with the method motivated by direction-dependent HRTF selection, while revealing insignificant differences in azimuth errors.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Feasibility of iMagLS-BSM -- ILD Informed Binaural Signal Matching with Arbitrary Microphone Arrays
Authors:
Or Berebi,
Zamir Ben-Hur,
David Lou Alon,
Boaz Rafaely
Abstract:
Binaural reproduction for headphone-centric listening has become a focal point in ongoing research, particularly within the realm of advancing technologies such as augmented and virtual reality (AR and VR). The demand for high-quality spatial audio in these applications is essential to uphold a seamless sense of immersion. However, challenges arise from wearable recording devices equipped with onl…
▽ More
Binaural reproduction for headphone-centric listening has become a focal point in ongoing research, particularly within the realm of advancing technologies such as augmented and virtual reality (AR and VR). The demand for high-quality spatial audio in these applications is essential to uphold a seamless sense of immersion. However, challenges arise from wearable recording devices equipped with only a limited number of microphones and irregular microphone placements due to design constraints. These factors contribute to limited reproduction quality compared to reference signals captured by high-order microphone arrays. This paper introduces a novel optimization loss tailored for a beamforming-based, signal-independent binaural reproduction scheme. This method, named iMagLS-BSM incorporates an interaural level difference (ILD) error term into the previously proposed binaural signal matching (BSM) magnitude least squares (MagLS) rendering loss for lateral plane angles. The method leverages nonlinear programming to minimize the introduced loss. Preliminary results show a substantial reduction in ILD error, while maintaining a binaural magnitude error comparable to that achieved with a MagLS BSM solution. These findings hold promise for enhancing the overall spatial quality of resultant binaural signals.
△ Less
Submitted 7 August, 2024;
originally announced August 2024.
-
Design and Analysis of Binaural Signal Matching with Arbitrary Microphone Arrays and Listener Head Rotations
Authors:
Lior Madmoni,
Zamir Ben-Hur,
Jacob Donley,
Vladimir Tourbabin,
Boaz Rafaely
Abstract:
Binaural reproduction is rapidly becoming a topic of great interest in the research community, especially with the surge of new and popular devices, such as virtual reality headsets, smart glasses, and head-tracked headphones. In order to immerse the listener in a virtual or remote environment with such devices, it is essential to generate realistic and accurate binaural signals. This is challengi…
▽ More
Binaural reproduction is rapidly becoming a topic of great interest in the research community, especially with the surge of new and popular devices, such as virtual reality headsets, smart glasses, and head-tracked headphones. In order to immerse the listener in a virtual or remote environment with such devices, it is essential to generate realistic and accurate binaural signals. This is challenging, especially since the microphone arrays mounted on these devices are typically composed of an arbitrarily-arranged small number of microphones, which impedes the use of standard audio formats like Ambisonics, and provides limited spatial resolution. The binaural signal matching (BSM) method was developed recently to overcome these challenges. While it produced binaural signals with low error using relatively simple arrays, its performance degraded significantly when head rotation was introduced. This paper aims to develop the BSM method further and overcome its limitations. For this purpose, the method is first analyzed in detail, and a design framework that guarantees accurate binaural reproduction for relatively complex acoustic environments is presented. Next, it is shown that the BSM accuracy may significantly degrade at high frequencies, and thus, a perceptually motivated extension to the method is proposed, based on a magnitude least-squares (MagLS) formulation. These insights and developments are then analyzed with the help of an extensive simulation study of a simple six-microphone semi-circular array. It is further shown that the BSM-MagLS method can be very useful in compensating for head rotations with this array. Finally, a listening experiment is conducted with a four-microphone array on a pair of glasses in a reverberant speech environment and including head rotations, where it is shown that BSM-MagLS can indeed produce binaural signals with a high perceived quality.
△ Less
Submitted 29 April, 2025; v1 submitted 7 August, 2024;
originally announced August 2024.
-
On HRTF Notch Frequency Prediction Using Anthropometric Features and Neural Networks
Authors:
Lior Arbel,
Ishwarya Ananthabhotla,
Zamir Ben-Hur,
David Lou Alon,
Boaz Rafaely
Abstract:
High fidelity spatial audio often performs better when produced using a personalized head-related transfer function (HRTF). However, the direct acquisition of HRTFs is cumbersome and requires specialized equipment. Thus, many personalization methods estimate HRTF features from easily obtained anthropometric features of the pinna, head, and torso. The first HRTF notch frequency (N1) is known to be…
▽ More
High fidelity spatial audio often performs better when produced using a personalized head-related transfer function (HRTF). However, the direct acquisition of HRTFs is cumbersome and requires specialized equipment. Thus, many personalization methods estimate HRTF features from easily obtained anthropometric features of the pinna, head, and torso. The first HRTF notch frequency (N1) is known to be a dominant feature in elevation localization, and thus a useful feature for HRTF personalization. This paper describes the prediction of N1 frequency from pinna anthropometry using a neural model. Prediction is performed separately on three databases, both simulated and measured, and then by domain mixing in-between the databases. The model successfully predicts N1 frequency for individual databases and by domain mixing between some databases. Prediction errors are better or comparable to those previously reported, showing significant improvement when acquired over a large database and with a larger output range.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
Ambisonics Networks -- The Effect Of Radial Functions Regularization
Authors:
Bar Shaybet,
Anurag Kumar,
Vladimir Tourbabin,
Boaz Rafaely
Abstract:
Ambisonics, a popular format of spatial audio, is the spherical harmonic (SH) representation of the plane wave density function of a sound field. Many algorithms operate in the SH domain and utilize the Ambisonics as their input signal. The process of encoding Ambisonics from a spherical microphone array involves dividing by the radial functions, which may amplify noise at low frequencies. This ca…
▽ More
Ambisonics, a popular format of spatial audio, is the spherical harmonic (SH) representation of the plane wave density function of a sound field. Many algorithms operate in the SH domain and utilize the Ambisonics as their input signal. The process of encoding Ambisonics from a spherical microphone array involves dividing by the radial functions, which may amplify noise at low frequencies. This can be overcome by regularization, with the downside of introducing errors to the Ambisonics encoding. This paper aims to investigate the impact of different ways of regularization on Deep Neural Network (DNN) training and performance. Ideally, these networks should be robust to the way of regularization. Simulated data of a single speaker in a room and experimental data from the LOCATA challenge were used to evaluate this robustness on an example algorithm of speaker localization based on the direct-path dominance (DPD) test. Results show that performance may be sensitive to the way of regularization, and an informed approach is proposed and investigated, highlighting the importance of regularization information.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
Ambisonics Encoding For Arbitrary Microphone Arrays Incorporating Residual Channels For Binaural Reproduction
Authors:
Yhonatan Gayer,
Vladimir Tourbabin,
Zamir Ben-Hur,
Jacob Donley,
Boaz Rafaely
Abstract:
In the rapidly evolving fields of virtual and augmented reality, accurate spatial audio capture and reproduction are essential. For these applications, Ambisonics has emerged as a standard format. However, existing methods for encoding Ambisonics signals from arbitrary microphone arrays face challenges, such as errors due to the irregular array configurations and limited spatial resolution resulti…
▽ More
In the rapidly evolving fields of virtual and augmented reality, accurate spatial audio capture and reproduction are essential. For these applications, Ambisonics has emerged as a standard format. However, existing methods for encoding Ambisonics signals from arbitrary microphone arrays face challenges, such as errors due to the irregular array configurations and limited spatial resolution resulting from a typically small number of microphones. To address these limitations and challenges, a mathematical framework for studying Ambisonics encoding is presented, highlighting the importance of incorporating the full steering function, and providing a novel measure for predicting the accuracy of encoding each Ambisonics channel from the steering functions alone. Furthermore, novel residual channels are formulated supplementing the Ambisonics channels. A simulation study for several array configurations demonstrates a reduction in binaural error for this approach.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Theory and investigation of acoustic multiple-input multiple-output systems based on spherical arrays in a room
Authors:
Hai Morgenstern,
Boaz Rafaely,
Franz Zotter
Abstract:
Spatial attributes of room acoustics have been widely studied using microphone and loudspeaker arrays. However, systems that combine both arrays, referred to as multiple-input multiple-output (MIMO) systems, have only been studied to a limited degree in this context. These systems can potentially provide a powerful tool for room acoustics analysis due to the ability to simultaneously control both…
▽ More
Spatial attributes of room acoustics have been widely studied using microphone and loudspeaker arrays. However, systems that combine both arrays, referred to as multiple-input multiple-output (MIMO) systems, have only been studied to a limited degree in this context. These systems can potentially provide a powerful tool for room acoustics analysis due to the ability to simultaneously control both arrays. This paper offers a theoretical framework for the spatial analysis of enclosed sound fields using a MIMO system comprising spherical loudspeaker and microphone arrays. A system transfer function is formulated in matrix form for free-field conditions, and its properties are studied using tools from linear algebra. The system is shown to have unit-rank, regardless of the array types, and its singular vectors are related to the directions of arrival and radiation at the microphone and loudspeaker arrays, respectively. The formulation is then generalized to apply to rooms, using an image source method. In this case, the rank of the system is related to the number of significant reflections. The paper ends with simulation studies, which support the developed theory, and with an extensive reflection analysis of a room impulse response, using the platform of a MIMO system.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Modal smoothing for analysis of room reflections measured with spherical microphone and loudspeaker arrays
Authors:
Hai Morgenstern,
Boaz Rafaely
Abstract:
Spatial analysis of room acoustics is an ongoing research topic. Microphone arrays have been employed for spatial analyses with an important objective being the estimation of the direction-of-arrival (DOA) of direct sound and early room reflections using room impulse responses (RIRs). An optimal method for DOA estimation is the multiple signal classification algorithm. When RIRs are considered, th…
▽ More
Spatial analysis of room acoustics is an ongoing research topic. Microphone arrays have been employed for spatial analyses with an important objective being the estimation of the direction-of-arrival (DOA) of direct sound and early room reflections using room impulse responses (RIRs). An optimal method for DOA estimation is the multiple signal classification algorithm. When RIRs are considered, this method typically fails due to the correlation of room reflections, which leads to rank deficiency of the cross-spectrum matrix. Preprocessing methods for rank restoration, which may involve averaging over frequency, for example, have been proposed exclusively for spherical arrays. However, these methods fail in the case of reflections with equal time delays, which may arise in practice and could be of interest. In this paper, a method is proposed for systems that combine a spherical microphone array and a spherical loudspeaker array, referred to as multiple-input multiple-output systems. This method, referred to as modal smoothing, exploits the additional spatial diversity for rank restoration and succeeds where previous methods fail, as demonstrated in a simulation study. Finally, combining modal smoothing with a preprocessing method is proposed in order to increase the number of DOAs that can be estimated using low-order spherical loudspeaker arrays.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Spatial Reverberation and Dereverberation using an Acoustic Multiple-Input Multiple-Output System
Authors:
Hai Morgenstern,
Boaz Rafaely
Abstract:
Methods are proposed for modifying the reverberation characteristics of sound fields in rooms by employing a loudspeaker with adjustable directivity, realized with a compact spherical loudspeaker array (SLA). These methods are based on minimization and maximization of clarity and direct-to-reverberant sound ratio. Significant modification of reverberation is achieved by these methods, as shown in…
▽ More
Methods are proposed for modifying the reverberation characteristics of sound fields in rooms by employing a loudspeaker with adjustable directivity, realized with a compact spherical loudspeaker array (SLA). These methods are based on minimization and maximization of clarity and direct-to-reverberant sound ratio. Significant modification of reverberation is achieved by these methods, as shown in simulation studies. The system under investigation includes a spherical microphone array and an SLA comprising a multiple-input multiple-output system. The robustness of these methods to system identification errors is also investigated. Finally, reverberation and dereverberation results are validated by a listening experiment.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Design framework for spherical microphone and loudspeaker arrays in a multiple-input multiple-output system
Authors:
Hai Morgenstern,
Boaz Rafaely,
Markus Noisternig
Abstract:
Spherical microphone arrays (SMAs) and spherical loudspeaker arrays (SLAs) facilitate the study of room acoustics due to the three-dimensional analysis they provide. More recently, systems that combine both arrays, referred to as multiple-input multiple-output (MIMO) systems, have been proposed due to the added spatial diversity they facilitate. The literature provides frameworks for designing SMA…
▽ More
Spherical microphone arrays (SMAs) and spherical loudspeaker arrays (SLAs) facilitate the study of room acoustics due to the three-dimensional analysis they provide. More recently, systems that combine both arrays, referred to as multiple-input multiple-output (MIMO) systems, have been proposed due to the added spatial diversity they facilitate. The literature provides frameworks for designing SMAs and SLAs separately, including error analysis from which the operating frequency range (OFR) of an array is defined. However, such a framework does not exist for the joint design of a SMA and a SLA that comprise a MIMO system. This paper develops a design framework for MIMO systems based on a model that addresses errors and highlights the importance of a matched design. Expanding on a free-field assumption, errors are incorporated separately for each array and error bounds are defined, facilitating error analysis for the system. The dependency of the error bounds on the SLA and SMA parameters is studied and it is recommended that parameters should be chosen to assure matched OFRs of the arrays in MIMO system design. A design example is provided, demonstrating the superiority of a matched system over an unmatched system in the synthesis of directional room impulse responses.
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
Theoretical Framework for the Optimization of Microphone Array Configuration for Humanoid Robot Audition
Authors:
Vladimir Tourbabin,
Boaz Rafaely
Abstract:
An important aspect of a humanoid robot is audition. Previous work has presented robot systems capable of sound localization and source segregation based on microphone arrays with various configurations. However, no theoretical framework for the design of these arrays has been presented. In the current paper, a design framework is proposed based on a novel array quality measure. The measure is bas…
▽ More
An important aspect of a humanoid robot is audition. Previous work has presented robot systems capable of sound localization and source segregation based on microphone arrays with various configurations. However, no theoretical framework for the design of these arrays has been presented. In the current paper, a design framework is proposed based on a novel array quality measure. The measure is based on the effective rank of a matrix composed of the generalized head related transfer functions (GHRTFs) that account for microphone positions other than the ears. The measure is shown to be theoretically related to standard array performance measures such as beamforming robustness and DOA estimation accuracy. Then, the measure is applied to produce sample designs of microphone arrays. Their performance is investigated numerically, verifying the advantages of array design based on the proposed theoretical framework.
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
Direction of Arrival Estimation Using Microphone Array Processing for Moving Humanoid Robots
Authors:
Vladimir Tourbabin,
Boaz Rafaely
Abstract:
The auditory system of humanoid robots has gained increased attention in recent years. This system typically acquires the surrounding sound field by means of a microphone array. Signals acquired by the array are then processed using various methods. One of the widely applied methods is direction of arrival estimation. The conventional direction of arrival estimation methods assume that the array i…
▽ More
The auditory system of humanoid robots has gained increased attention in recent years. This system typically acquires the surrounding sound field by means of a microphone array. Signals acquired by the array are then processed using various methods. One of the widely applied methods is direction of arrival estimation. The conventional direction of arrival estimation methods assume that the array is fixed at a given position during the estimation. However, this is not necessarily true for an array installed on a moving humanoid robot. The array motion, if not accounted for appropriately, can introduce a significant error in the estimated direction of arrival. The current paper presents a signal model that takes the motion into account. Based on this model, two processing methods are proposed. The first one compensates for the motion of the robot. The second method is applicable to periodic signals and utilizes the motion in order to enhance the performance to a level beyond that of a stationary array. Numerical simulations and an experimental study are provided, demonstrating that the motion compensation method almost eliminates the motion-related error. It is also demonstrated that by using the motion-based enhancement method it is possible to improve the direction of arrival estimation performance, as compared to that obtained when using a stationary array.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
Optimal Real-Weighted Beamforming With Application to Linear and Spherical Arrays
Authors:
V. Tourbabin,
M. Agmon,
B. Rafaely,
J. Tabrikian
Abstract:
One of the uses of sensor arrays is for spatial filtering or beamforming. Current digital signal processing methods facilitate complex-weighted beamforming, providing flexibility in array design. Previous studies proposed the use of real-valued beamforming weights, which although reduce flexibility in design, may provide a range of benefits, e.g., simplified beamformer implementation or efficient…
▽ More
One of the uses of sensor arrays is for spatial filtering or beamforming. Current digital signal processing methods facilitate complex-weighted beamforming, providing flexibility in array design. Previous studies proposed the use of real-valued beamforming weights, which although reduce flexibility in design, may provide a range of benefits, e.g., simplified beamformer implementation or efficient beamforming algorithms. This paper presents a new method for the design of arrays with real-valued weights, that achieve maximum directivity, providing closed-form solution to array weights. The method is studied for linear and spherical arrays, where it is shown that rigid spherical arrays are particularly suitable for real-weight designs as they do not suffer from grating lobes, a dominant feature in linear arrays with real weights. A simulation study is presented for linear and spherical arrays, along with an experimental investigation, validating the theoretical developments.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
The role of direct sound spherical harmonics representation in externalization using binaural reproduction
Authors:
Eran Miller,
Boaz Rafaely
Abstract:
The importance of the information in the direct sound to human perception of spatial sound sources is an ongoing research topic. The classification between direct sound and diffuse or reverberant sound forms the basis of numerous studies in the field of spatial audio. In particular, parametric spatial audio representation methods use this classification and employ signal processing in order to enh…
▽ More
The importance of the information in the direct sound to human perception of spatial sound sources is an ongoing research topic. The classification between direct sound and diffuse or reverberant sound forms the basis of numerous studies in the field of spatial audio. In particular, parametric spatial audio representation methods use this classification and employ signal processing in order to enhance the audio quality at reproduction. However, current literature does not provide information concerning the impact of ideal direct sound representation on externalization, in the context of Ambisonics. This paper aims to assess the importance of the spatial information in the direct sound in the externalization of a sound field when using binaural reproduction. This is done in the spherical harmonics (SH) domain, where an ideal direct sound representation within an otherwise Ambisonics signal is simulated, and its perceived externalization is evaluated in a formal listening test. This investigation leads to the conclusion that externalization of a first order Ambisonics signal may be significantly improved by enhancing the direct sound component, up to a level similar to a third order Ambisonics signal.
△ Less
Submitted 1 January, 2024;
originally announced January 2024.
-
Blind Localization of Room Reflections with Application to Spatial Audio
Authors:
Yogev Hadadi,
Vladimir Tourbabin,
Paul Calamia,
Boaz Rafaely
Abstract:
Blind estimation of early room reflections, without knowledge of the room impulse response, holds substantial value. The FF-PHALCOR (Frequency Focusing PHase ALigned CORrelation), method was recently developed for this objective, extending the original PHALCOR method from spherical to arbitrary arrays. However, previous studies only compared the two methods under limited conditions without present…
▽ More
Blind estimation of early room reflections, without knowledge of the room impulse response, holds substantial value. The FF-PHALCOR (Frequency Focusing PHase ALigned CORrelation), method was recently developed for this objective, extending the original PHALCOR method from spherical to arbitrary arrays. However, previous studies only compared the two methods under limited conditions without presenting a comprehensive performance analysis. This study presents an advance by evaluating the performance of the algorithm in a wider range of conditions. Additionally, performance in terms of perception is investigated through a listening test. This test involves synthesizing room impulse responses from known room acoustics parameters and replacing the early reflections with the estimated ones. The importance of the estimated reflections for spatial perception is demonstrated through this test.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Study of speaker localization under dynamic and reverberant environments
Authors:
Daniel A. Mitchell,
Boaz Rafaely
Abstract:
Speaker localization in a reverberant environment is a fundamental problem in audio signal processing. Many solutions have been developed to tackle this problem. However, previous algorithms typically assume a stationary environment in which both the microphone array and the sound sources are not moving. With the emergence of wearable microphone arrays, acoustic scenes have become dynamic with mov…
▽ More
Speaker localization in a reverberant environment is a fundamental problem in audio signal processing. Many solutions have been developed to tackle this problem. However, previous algorithms typically assume a stationary environment in which both the microphone array and the sound sources are not moving. With the emergence of wearable microphone arrays, acoustic scenes have become dynamic with moving sources and arrays. This calls for algorithms that perform well in dynamic environments. In this article, we study the performance of a speaker localization algorithm in such an environment. The study is based on the recently published EasyCom speech dataset recorded in reverberant and noisy environments using a wearable array on glasses. Although the localization algorithm performs well in static environments, its performance degraded substantially when used on the EasyCom dataset. The paper presents performance analysis and proposes methods for improvement.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
iMagLS: Interaural Level Difference with Magnitude Least-Squares Loss for Optimized First-Order Head-Related Transfer Function
Authors:
Or Berebi,
Zamir Ben-Hur,
David Lou Alon,
Boaz Rafaely
Abstract:
Binaural reproduction for headphone-based listening is an active research area due to its widespread use in evolving technologies such as augmented and virtual reality (AR and VR). On the one hand, these applications demand high quality spatial audio perception to preserve the sense of immersion. On the other hand, recording devices may only have a few microphones, leading to low-order representat…
▽ More
Binaural reproduction for headphone-based listening is an active research area due to its widespread use in evolving technologies such as augmented and virtual reality (AR and VR). On the one hand, these applications demand high quality spatial audio perception to preserve the sense of immersion. On the other hand, recording devices may only have a few microphones, leading to low-order representations such as first-order Ambisonics (FOA). However, first-order Ambisonics leads to limited externalization and spatial resolution. In this paper, a novel head-related transfer function (HRTF) preprocessing optimization loss is proposed, and is minimized using nonlinear programming. The new method, denoted iMagLS, involves the introduction of an interaural level difference (ILD) error term to the now widely used MagLS optimization loss for the lateral plane angles. Results indicate that the ILD error could be substantially reduced, while the HRTF magnitude error remains similar to that obtained with MagLS. These results could prove beneficial to the overall spatial quality of first-order Ambisonics, while other reproduction methods could also benefit from considering this modified loss.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Performance Analysis Of Binaural Signal Matching (BSM) in the Time-Frequency Domain
Authors:
Ami Berger,
Vladimir Tourbabin,
Jacob Donley,
Zamir Ben-Hur,
Boaz Rafaely
Abstract:
The capture and reproduction of spatial audio is becoming increasingly popular, with the mushrooming of applications in teleconferencing, entertainment and virtual reality. Many binaural reproduction methods have been developed and studied extensively for spherical and other specially designed arrays. However, the recent increased popularity of wearable and mobile arrays requires the development o…
▽ More
The capture and reproduction of spatial audio is becoming increasingly popular, with the mushrooming of applications in teleconferencing, entertainment and virtual reality. Many binaural reproduction methods have been developed and studied extensively for spherical and other specially designed arrays. However, the recent increased popularity of wearable and mobile arrays requires the development of binaural reproduction methods for these arrays. One such method is binaural signal matching (BSM). However, to date this method has only been investigated with fixed matched filters designed for long audio recordings. With the aim of making the BSM method more adaptive to dynamic environments, this paper analyzes BSM with a parameterized sound-field in the time-frequency domain. The paper presents results of implementing the BSM method on a sound-field that was decomposed into its direct and reverberant components, and compares this implementation with the BSM computed for the entire sound-field, to compare performance for binaural reproduction of reverberant speech in a simulated environment.
△ Less
Submitted 23 November, 2023; v1 submitted 22 November, 2023;
originally announced November 2023.
-
Study of speaker localization with binaural microphone array incorporating auditory filters and lateral angle estimation
Authors:
Yanir Maymon,
Israel Nelken,
Boaz Rafaely
Abstract:
Speaker localization for binaural microphone arrays has been widely studied for applications such as speech communication, video conferencing, and robot audition. Many methods developed for this task, including the direct path dominance (DPD) test, share common stages in their processing, which include transformation using the short-time Fourier transform (STFT), and a direction of arrival (DOA) s…
▽ More
Speaker localization for binaural microphone arrays has been widely studied for applications such as speech communication, video conferencing, and robot audition. Many methods developed for this task, including the direct path dominance (DPD) test, share common stages in their processing, which include transformation using the short-time Fourier transform (STFT), and a direction of arrival (DOA) search that is based on the head related transfer function (HRTF) set. In this paper, alternatives to these processing stages, motivated by human hearing, are proposed. These include incorporating an auditory filter bank to replace the STFT, and a new DOA search based on transformed HRTF as steering vectors. A simulation study and an experimental study are conducted to validate the proposed alternatives, and both are applied to two binaural DOA estimation methods; the results show that the proposed method compares favorably with current methods.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Optimal model-based beamforming and independent steering for spherical loudspeaker arrays
Authors:
Boaz Rafaely,
Dima Khaykin
Abstract:
Spherical loudspeaker arrays have been recently studied for directional sound radiation, where the compact arrangement of the loudspeaker units around a sphere facilitated the control of sound radiation in three-dimensional space. Directivity of sound radiation, or beamforming, was achieved by driving each loudspeaker unit independently, where the design of beamforming weights was typically achiev…
▽ More
Spherical loudspeaker arrays have been recently studied for directional sound radiation, where the compact arrangement of the loudspeaker units around a sphere facilitated the control of sound radiation in three-dimensional space. Directivity of sound radiation, or beamforming, was achieved by driving each loudspeaker unit independently, where the design of beamforming weights was typically achieved by numerical optimization with reference to a given desired beam pattern. This is in contrast to the methods already developed for microphone arrays in general and spherical microphone arrays in particular, where beamformer weights are designed to satisfy a wider range of objectives, related to directivity, robustness, and side-lobe level, for example. This paper presents the development of a physical-model-based, optimal beamforming framework for spherical loudspeaker arrays, similar to the framework already developed for spherical microphone arrays, facilitating efficient beamforming in the spherical harmonics domain, with independent steering. In particular, it is shown that from a beamforming perspective, the spherical loudspeaker array is similar to the spherical microphone array with microphones arranged around a rigid sphere. Experimental investigation validates the theoretical framework of beamformer design.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Zones of quiet in a broadband diffuse sound field
Authors:
Boaz Rafaely
Abstract:
The zones of quiet in pure-tone diffuse sound fields have been studied extensively in the past, both theoretically and experimentally, with the well known result of the 10\,dB attenuation extending to about a tenth of a wavelength. Recent results on the spatial-temporal correlation of broadband diffuse sound fields are used in this study to develop a theoretical framework for predicting the extens…
▽ More
The zones of quiet in pure-tone diffuse sound fields have been studied extensively in the past, both theoretically and experimentally, with the well known result of the 10\,dB attenuation extending to about a tenth of a wavelength. Recent results on the spatial-temporal correlation of broadband diffuse sound fields are used in this study to develop a theoretical framework for predicting the extension of the zones of quiet in broadband diffuse sound fields. This can be used to study the acoustic limitations imposed on local active sound control systems such as an active headrest when controlling broadband noise. Spatial-temporal correlation is first revised, after which derivations of the diffuse field zones of quiet in the near-field and the far-field of the secondary source are presented. The theoretical analysis is supported by simulation examples comparing the zones of quiet for diffuse fields excited by tonal and broadband signals. It is shown that as a first approximation the zone of quiet of a low-pass filtered noise is comparable to that of a pure-tone with a frequency equal to the center frequency of the broadband noise bandwidth.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Spatial sampling and beamforming for spherical microphone arrays
Authors:
Boaz Rafaely
Abstract:
Spherical microphone arrays have been recently studied for spatial sound recording, speech communication, and sound field analysis for room acoustics and noise control. Complementary theoretical studies presented progress in spatial sampling and beamforming methods. This paper reviews recent results in spatial sampling that facilitate a wide range of spherical array configurations, from a single r…
▽ More
Spherical microphone arrays have been recently studied for spatial sound recording, speech communication, and sound field analysis for room acoustics and noise control. Complementary theoretical studies presented progress in spatial sampling and beamforming methods. This paper reviews recent results in spatial sampling that facilitate a wide range of spherical array configurations, from a single rigid sphere to free positioning of microphones. The paper then presents an overview of beamforming methods recently presented for spherical arrays, from the widely used delay-and-sum and Dolph-Chebyshev, to the more advanced optimal methods, typically performed in the spherical harmonics domain.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.
-
Speaker localization using direct path dominance test based on sound field directivity
Authors:
Boaz Rafaely,
Koby Alhaiany
Abstract:
Estimation of the direction-of-arrival (DoA) of a speaker in a room is important in many audio signal processing applications. Environments with reverberation that masks the DoA information are particularly challenging. Recently, a DoA estimation method that is robust to reverberation has been developed. This method identifies time-frequency bins dominated by the contribution from the direct path,…
▽ More
Estimation of the direction-of-arrival (DoA) of a speaker in a room is important in many audio signal processing applications. Environments with reverberation that masks the DoA information are particularly challenging. Recently, a DoA estimation method that is robust to reverberation has been developed. This method identifies time-frequency bins dominated by the contribution from the direct path, which carries the correct DoA information. However, its implementation is computationally demanding as it requires frequency smoothing to overcome the effect of coherent early reflections and matrix decomposition to apply the direct-path dominance (DPD) test. In this work, a novel computationally-efficient alternative to the DPD test is proposed, based on the directivity measure for sensor arrays, which requires neither frequency smoothing nor matrix decomposition, and which has been reformulated for sound field directivity with spherical microphone arrays. The paper presents the proposed method and a comparison to previous methods under a range of reverberation and noise conditions. Result demonstrate that the proposed method shows comparable performance to the original method in terms of robustness to reverberation and noise, and is about four times more computationally efficient for the given experiment.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Description of algorithms for Ben-Gurion University Submission to the LOCATA challenge
Authors:
Lior Madmoni,
Hanan Beit-On,
Hai Morgenstern,
Boaz Rafaely
Abstract:
This paper summarizes the methods used to localize the sources recorded for the LOCalization And TrAcking (LOCATA) challenge. The tasks of stationary sources and arrays were considered, i.e., tasks 1 and 2 of the challenge, which were recorded with the Nao robot array, and the Eigenmike array. For both arrays, direction of arrival (DOA) estimation has been performed with measurements in the short…
▽ More
This paper summarizes the methods used to localize the sources recorded for the LOCalization And TrAcking (LOCATA) challenge. The tasks of stationary sources and arrays were considered, i.e., tasks 1 and 2 of the challenge, which were recorded with the Nao robot array, and the Eigenmike array. For both arrays, direction of arrival (DOA) estimation has been performed with measurements in the short time Fourier transform domain, and with direct-path dominance (DPD) based tests, which aim to identify time-frequency (TF) bins dominated by the direct sound. For the recordings with Nao, a DPD test which is applied directly to the microphone signals was used. For the Eigenmike recordings, a DPD based test designed for plane-wave density measurements in the spherical harmonics domain was used. After acquiring DOA estimates with TF bins that passed the DPD tests, a stage of k-means clustering is performed, to assign a final DOA estimate for each speaker.
△ Less
Submitted 12 December, 2018;
originally announced December 2018.