Search | arXiv e-print repository

Physics-Informed Neural Network-Driven Sparse Field Discretization Method for Near-Field Acoustic Holography

Authors: Xinmeng Luan, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti

Abstract: We propose the Physics-Informed Neural Network-driven Sparse Field Discretization method (PINN-SFD), a novel self-supervised, physics-informed deep learning approach for addressing the Near-Field Acoustic Holography (NAH) problem. Unlike existing deep learning methods for NAH, which are predominantly supervised by large datasets, our approach does not require a training phase and it is physics-inf… ▽ More We propose the Physics-Informed Neural Network-driven Sparse Field Discretization method (PINN-SFD), a novel self-supervised, physics-informed deep learning approach for addressing the Near-Field Acoustic Holography (NAH) problem. Unlike existing deep learning methods for NAH, which are predominantly supervised by large datasets, our approach does not require a training phase and it is physics-informed. The wave propagation field is discretized into sparse regions, a process referred to as field discretization, which includes a series of set of source planes, to address the inverse problem. Our method employs the discretized Kirchhoff-Helmholtz integral as the wave propagation model. By incorporating virtual planes, additional constraints are enforced near the actual sound source, improving the reconstruction process. Optimization is carried out using Physics-Informed Neural Networks (PINNs), where physics-based constraints are integrated into the loss functions to account for both direct (from equivalent source plane to hologram plane) and additional (from virtual planes to hologram plane) wave propagation paths. Additionally, sparsity is enforced on the velocity of the equivalent sources. Our comprehensive validation across various rectangular and violin top plates, covering a wide range of vibrational modes, demonstrates that PINN-SFD consistently outperforms the conventional Compressive-Equivalent Source Method (C-ESM), particularly in terms of reconstruction accuracy for complex vibrational patterns. Significantly, this method demonstrates reduced sensitivity to regularization parameters compared to C-ESM. △ Less

Submitted 1 May, 2025; originally announced May 2025.

Comments: 12 pages, 7 figures

arXiv:2504.20625 [pdf, other]

DiffusionRIR: Room Impulse Response Interpolation using Diffusion Models

Authors: Sagi Della Torre, Mirco Pezzoli, Fabio Antonacci, Sharon Gannot

Abstract: Room Impulse Responses (RIRs) characterize acoustic environments and are crucial in multiple audio signal processing tasks. High-quality RIR estimates drive applications such as virtual microphones, sound source localization, augmented reality, and data augmentation. However, obtaining RIR measurements with high spatial resolution is resource-intensive, making it impractical for large spaces or wh… ▽ More Room Impulse Responses (RIRs) characterize acoustic environments and are crucial in multiple audio signal processing tasks. High-quality RIR estimates drive applications such as virtual microphones, sound source localization, augmented reality, and data augmentation. However, obtaining RIR measurements with high spatial resolution is resource-intensive, making it impractical for large spaces or when dense sampling is required. This research addresses the challenge of estimating RIRs at unmeasured locations within a room using Denoising Diffusion Probabilistic Models (DDPM). Our method leverages the analogy between RIR matrices and image inpainting, transforming RIR data into a format suitable for diffusion-based reconstruction. Using simulated RIR data based on the image method, we demonstrate our approach's effectiveness on microphone arrays of different curvatures, from linear to semi-circular. Our method successfully reconstructs missing RIRs, even in large gaps between microphones. Under these conditions, it achieves accurate reconstruction, significantly outperforming baseline Spline Cubic Interpolation in terms of Normalized Mean Square Error and Cosine Distance between actual and interpolated RIRs. This research highlights the potential of using generative models for effective RIR interpolation, paving the way for generating additional data from limited real-world measurements. △ Less

Submitted 29 April, 2025; originally announced April 2025.

arXiv:2501.02871 [pdf, other]

Towards HRTF Personalization using Denoising Diffusion Models

Authors: Juan Camilo Albarracín Sánchez, Luca Comanducci, Mirco Pezzoli, Fabio Antonacci

Abstract: Head-Related Transfer Functions (HRTFs) have fundamental applications for realistic rendering in immersive audio scenarios. However, they are strongly subject-dependent as they vary considerably depending on the shape of the ears, head and torso. Thus, personalization procedures are required for accurate binaural rendering. Recently, Denoising Diffusion Probabilistic Models (DDPMs), a class of gen… ▽ More Head-Related Transfer Functions (HRTFs) have fundamental applications for realistic rendering in immersive audio scenarios. However, they are strongly subject-dependent as they vary considerably depending on the shape of the ears, head and torso. Thus, personalization procedures are required for accurate binaural rendering. Recently, Denoising Diffusion Probabilistic Models (DDPMs), a class of generative learning techniques, have been applied to solve a variety of signal processing-related problems. In this paper, we propose a first approach for using DDPM conditioned on anthropometric measurements to generate personalized Head-Related Impulse Response (HRIR), the time-domain representation of HRTF. The results show the feasibility of DDPMs for HRTF personalization obtaining performance in line with state-of-the-art models. △ Less

Submitted 6 January, 2025; originally announced January 2025.

Comments: to appear in ICASSP 2025

arXiv:2412.18348 [pdf, other]

A Zero-Shot Physics-Informed Dictionary Learning Approach for Sound Field Reconstruction

Authors: Stefano Damiano, Federico Miotello, Mirco Pezzoli, Alberto Bernardini, Fabio Antonacci, Augusto Sarti, Toon van Waterschoot

Abstract: Sound field reconstruction aims to estimate pressure fields in areas lacking direct measurements. Existing techniques often rely on strong assumptions or face challenges related to data availability or the explicit modeling of physical properties. To bridge these gaps, this study introduces a zero-shot, physics-informed dictionary learning approach to perform sound field reconstruction. Our method… ▽ More Sound field reconstruction aims to estimate pressure fields in areas lacking direct measurements. Existing techniques often rely on strong assumptions or face challenges related to data availability or the explicit modeling of physical properties. To bridge these gaps, this study introduces a zero-shot, physics-informed dictionary learning approach to perform sound field reconstruction. Our method relies only on a few sparse measurements to learn a dictionary, without the need for additional training data. Moreover, by enforcing the Helmholtz equation during the optimization process, the proposed approach ensures that the reconstructed sound field is represented as a linear combination of a few physically meaningful atoms. Evaluations on real-world data show that our approach achieves comparable performance to state-of-the-art dictionary learning techniques, with the advantage of requiring only a few observations of the sound field and no training on a dataset. △ Less

Submitted 24 December, 2024; originally announced December 2024.

Comments: Accepted for publication at ICASSP 2025

arXiv:2408.14731 [pdf, other]

Physics-Informed Machine Learning For Sound Field Estimation

Authors: Shoichi Koyama, Juliano G. C. Ribeiro, Tomohiko Nakamura, Natsuki Ueno, Mirco Pezzoli

Abstract: The area of study concerning the estimation of spatial sound, i.e., the distribution of a physical quantity of sound such as acoustic pressure, is called sound field estimation, which is the basis for various applied technologies related to spatial audio processing. The sound field estimation problem is formulated as a function interpolation problem in machine learning in a simplified scenario. Ho… ▽ More The area of study concerning the estimation of spatial sound, i.e., the distribution of a physical quantity of sound such as acoustic pressure, is called sound field estimation, which is the basis for various applied technologies related to spatial audio processing. The sound field estimation problem is formulated as a function interpolation problem in machine learning in a simplified scenario. However, high estimation performance cannot be expected by simply applying general interpolation techniques that rely only on data. The physical properties of sound fields are useful a priori information, and it is considered extremely important to incorporate them into the estimation. In this article, we introduce the fundamentals of physics-informed machine learning (PIML) for sound field estimation and overview current PIML-based sound field estimation methods. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: Accepted to IEEE Signal Processing Magazine, Special Issue on Model-based and Data-Driven Audio Signal Processing

arXiv:2407.18732 [pdf, ps, other]

A Physics-Informed Neural Network-Based Approach for the Spatial Upsampling of Spherical Microphone Arrays

Authors: Federico Miotello, Ferdinando Terminiello, Mirco Pezzoli, Alberto Bernardini, Fabio Antonacci, Augusto Sarti

Abstract: Spherical microphone arrays are convenient tools for capturing the spatial characteristics of a sound field. However, achieving superior spatial resolution requires arrays with numerous capsules, consequently leading to expensive devices. To address this issue, we present a method for spatially upsampling spherical microphone arrays with a limited number of capsules. Our approach exploits a physic… ▽ More Spherical microphone arrays are convenient tools for capturing the spatial characteristics of a sound field. However, achieving superior spatial resolution requires arrays with numerous capsules, consequently leading to expensive devices. To address this issue, we present a method for spatially upsampling spherical microphone arrays with a limited number of capsules. Our approach exploits a physics-informed neural network with Rowdy activation functions, leveraging physical constraints to provide high-order microphone array signals, starting from low-order devices. Results show that, within its domain of application, our approach outperforms a state of the art method based on signal processing for spherical microphone arrays upsampling. △ Less

Submitted 26 July, 2024; originally announced July 2024.

Comments: Accepted for publication at IWAENC 2024

arXiv:2403.09524 [pdf, other]

Physics-Informed Neural Network for Volumetric Sound field Reconstruction of Speech Signals

Authors: Marco Olivieri, Xenofon Karakonstantis, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti, Efren Fernandez-Grande

Abstract: Recent developments in acoustic signal processing have seen the integration of deep learning methodologies, alongside the continued prominence of classical wave expansion-based approaches, particularly in sound field reconstruction. Physics-Informed Neural Networks (PINNs) have emerged as a novel framework, bridging the gap between data-driven and model-based techniques for addressing physical phe… ▽ More Recent developments in acoustic signal processing have seen the integration of deep learning methodologies, alongside the continued prominence of classical wave expansion-based approaches, particularly in sound field reconstruction. Physics-Informed Neural Networks (PINNs) have emerged as a novel framework, bridging the gap between data-driven and model-based techniques for addressing physical phenomena governed by partial differential equations. This paper introduces a PINN-based approach for the recovery of arbitrary volumetric acoustic fields. The network incorporates the wave equation to impose a regularization on signal reconstruction in the time domain. This methodology enables the network to learn the underlying physics of sound propagation and allows for the complete characterization of the sound field based on a limited set of observations. The proposed method's efficacy is validated through experiments involving speech signals in a real-world environment, considering varying numbers of available measurements. Moreover, a comparative analysis is undertaken against state-of-the-art frequency-domain and time-domain reconstruction methods from existing literature, highlighting the increased accuracy across the various measurement configurations. △ Less

Submitted 23 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

arXiv:2402.13896 [pdf, other]

HOMULA-RIR: A Room Impulse Response Dataset for Teleconferencing and Spatial Audio Applications Acquired Through Higher-Order Microphones and Uniform Linear Microphone Arrays

Authors: Federico Miotello, Paolo Ostan, Mirco Pezzoli, Luca Comanducci, Alberto Bernardini, Fabio Antonacci, Augusto Sarti

Abstract: In this paper, we present HOMULA-RIR, a dataset of room impulse responses (RIRs) acquired using both higher-order microphones (HOMs) and a uniform linear array (ULA), in order to model a remote attendance teleconferencing scenario. Specifically, measurements were performed in a seminar room, where a 64-microphone ULA was used as a multichannel audio acquisition system in the proximity of the speak… ▽ More In this paper, we present HOMULA-RIR, a dataset of room impulse responses (RIRs) acquired using both higher-order microphones (HOMs) and a uniform linear array (ULA), in order to model a remote attendance teleconferencing scenario. Specifically, measurements were performed in a seminar room, where a 64-microphone ULA was used as a multichannel audio acquisition system in the proximity of the speakers, while HOMs were used to model 25 attendees actually present in the seminar room. The HOMs cover a wide area of the room, making the dataset suitable also for applications of virtual acoustics. Through the measurement of the reverberation time and clarity index, and sample applications such as source localization and separation, we demonstrate the effectiveness of the HOMULA-RIR dataset. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: Accepted for publication at ICASSP 2024 - HSCMA Workshop

arXiv:2402.04866 [pdf, other]

Room Transfer Function Reconstruction Using Complex-valued Neural Networks and Irregularly Distributed Microphones

Authors: Francesca Ronchini, Luca Comanducci, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti

Abstract: Reconstructing the room transfer functions needed to calculate the complex sound field in a room has several important real-world applications. However, an unpractical number of microphones is often required. Recently, in addition to classical signal processing methods, deep learning techniques have been applied to reconstruct the room transfer function starting from a very limited set of measurem… ▽ More Reconstructing the room transfer functions needed to calculate the complex sound field in a room has several important real-world applications. However, an unpractical number of microphones is often required. Recently, in addition to classical signal processing methods, deep learning techniques have been applied to reconstruct the room transfer function starting from a very limited set of measurements at scattered points in the room. In this paper, we employ complex-valued neural networks to estimate room transfer functions in the frequency range of the first room resonances, using a few irregularly distributed microphones. To the best of our knowledge, this is the first time that complex-valued neural networks are used to estimate room transfer functions. To analyze the benefits of applying complex-valued optimization to the considered task, we compare the proposed technique with a state-of-the-art kernel-based signal processing approach for sound field reconstruction, showing that the proposed technique exhibits relevant advantages in terms of phase accuracy and overall quality of the reconstructed sound field. For informative purposes, we also compare the model with a similarly-structured data-driven approach that, however, applies a real-valued neural network to reconstruct only the magnitude of the sound field. △ Less

Submitted 11 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: Accepted at EUSIPCO 2024

arXiv:2312.08821 [pdf, other]

Reconstruction of Sound Field through Diffusion Models

Authors: Federico Miotello, Luca Comanducci, Mirco Pezzoli, Alberto Bernardini, Fabio Antonacci, Augusto Sarti

Abstract: Reconstructing the sound field in a room is an important task for several applications, such as sound control and augmented (AR) or virtual reality (VR). In this paper, we propose a data-driven generative model for reconstructing the magnitude of acoustic fields in rooms with a focus on the modal frequency range. We introduce, for the first time, the use of a conditional Denoising Diffusion Probab… ▽ More Reconstructing the sound field in a room is an important task for several applications, such as sound control and augmented (AR) or virtual reality (VR). In this paper, we propose a data-driven generative model for reconstructing the magnitude of acoustic fields in rooms with a focus on the modal frequency range. We introduce, for the first time, the use of a conditional Denoising Diffusion Probabilistic Model (DDPM) trained in order to reconstruct the sound field (SF-Diff) over an extended domain. The architecture is devised in order to be conditioned on a set of limited available measurements at different frequencies and generate the sound field in target, unknown, locations. The results show that SF-Diff is able to provide accurate reconstructions, outperforming a state-of-the-art baseline based on kernel interpolation. △ Less

Submitted 21 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted for publication at ICASSP 2024

arXiv:2306.11509 [pdf, other]

doi 10.61782/fa.2023.1182

Implicit neural representation with physics-informed neural networks for the reconstruction of the early part of room impulse responses

Authors: Mirco Pezzoli, Fabio Antonacci, Augusto Sarti

Abstract: Recently deep learning and machine learning approaches have been widely employed for various applications in acoustics. Nonetheless, in the area of sound field processing and reconstruction classic methods based on the solutions of wave equation are still widespread. Recently, physics-informed neural networks have been proposed as a deep learning paradigm for solving partial differential equations… ▽ More Recently deep learning and machine learning approaches have been widely employed for various applications in acoustics. Nonetheless, in the area of sound field processing and reconstruction classic methods based on the solutions of wave equation are still widespread. Recently, physics-informed neural networks have been proposed as a deep learning paradigm for solving partial differential equations which govern physical phenomena, bridging the gap between purely data-driven and model based methods. Here, we exploit physics-informed neural networks to reconstruct the early part of missing room impulse responses in an uniform linear array. This methodology allows us to exploit the underlying law of acoustics, i.e., the wave equation, forcing the neural network to generate physically meaningful solutions given only a limited number of data points. The results on real measurements show that the proposed model achieves accurate reconstruction and performance in line with respect to state-of-the-art deep-learning and compress sensing techniques while maintaining a lightweight architecture. △ Less

Submitted 20 June, 2023; originally announced June 2023.

Comments: Accepted for publication at Forum Acusticum 2023

Journal ref: Proceedings of the 10th Convention of the European Acoustics Association Forum Acusticum 2023 (pp. 2177-2184)

arXiv:2303.08480 [pdf, other]

doi 10.1109/ICASSP49357.2023.10095324

Acoustic source localization in the spherical harmonics domain exploiting low-rank approximations

Authors: Maximo Cobos, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti

Abstract: Acoustic signal processing in the spherical harmonics domain (SHD) is an active research area that exploits the signals acquired by higher order microphone arrays. A very important task is that concerning the localization of active sound sources. In this paper, we propose a simple yet effective method to localize prominent acoustic sources in adverse acoustic scenarios. By using a proper normaliza… ▽ More Acoustic signal processing in the spherical harmonics domain (SHD) is an active research area that exploits the signals acquired by higher order microphone arrays. A very important task is that concerning the localization of active sound sources. In this paper, we propose a simple yet effective method to localize prominent acoustic sources in adverse acoustic scenarios. By using a proper normalization and arrangement of the estimated spherical harmonic coefficients, we exploit low-rank approximations to estimate the far field modal directional pattern of the dominant source at each time-frame. The experiments confirm the validity of the proposed approach, with superior performance compared to other recent SHD-based approaches. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: To appear in ICASSP 2023

Journal ref: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:2103.16935 [pdf, other]

Near field Acoustic Holography on arbitrary shapes using Convolutional Neural Network

Authors: Marco Olivieri, Mirco Pezzoli, Fabio Antonacci, Augusto Sarti

Abstract: Near-field Acoustic Holography (NAH) is a well-known problem aimed at estimating the vibrational velocity field of a structure by means of acoustic measurements. In this paper, we propose a NAH technique based on Convolutional Neural Network (CNN). The devised CNN predicts the vibrational field on the surface of arbitrary shaped plates (violin plates) with orthotropic material properties from a li… ▽ More Near-field Acoustic Holography (NAH) is a well-known problem aimed at estimating the vibrational velocity field of a structure by means of acoustic measurements. In this paper, we propose a NAH technique based on Convolutional Neural Network (CNN). The devised CNN predicts the vibrational field on the surface of arbitrary shaped plates (violin plates) with orthotropic material properties from a limited number of measurements. In particular, the architecture, named Super Resolution CNN (SRCNN), is able to estimate the vibrational field with a higher spatial resolution compared to the input pressure. The pressure and velocity datasets have been generated through Finite Element Method simulations. We validate the proposed method by comparing the estimates with the synthesized ground truth and with a state-of-the-art technique. Moreover, we evaluate the robustness of the devised network against noisy input data. △ Less

Submitted 29 June, 2021; v1 submitted 31 March, 2021; originally announced March 2021.

Comments: accepted for publication in EUSIPCO21

Showing 1–13 of 13 results for author: Pezzoli, M