Search | arXiv e-print repository

doi 10.1561/2000000121

Sound Field Estimation: Theories and Applications

Abstract: The spatial information of sound plays a crucial role in various situations, ranging from daily activities to advanced engineering technologies. To fully utilize its potential, numerous research studies on spatial audio signal processing have been carried out in the literature. Sound field estimation is one of the key foundational technologies that can be applied to a wide range of acoustic signal… ▽ More The spatial information of sound plays a crucial role in various situations, ranging from daily activities to advanced engineering technologies. To fully utilize its potential, numerous research studies on spatial audio signal processing have been carried out in the literature. Sound field estimation is one of the key foundational technologies that can be applied to a wide range of acoustic signal processing techniques, including sound field reproduction using loudspeakers and binaural playback through headphones. The purpose of this paper is to present an overview of sound field estimation methods. After providing the necessary mathematical background, two different approaches to sound field estimation will be explained. This paper focuses on clarifying the essential theories of each approach, while also referencing state-of-the-art developments. Finally, several acoustic signal processing technologies will be discussed as examples of the application of sound field estimation. △ Less

Submitted 12 March, 2025; originally announced March 2025.

Comments: published in Foundations and Trends in Signal Processing, vol. 19, no. 1; see https://www.nowpublishers.com/article/Details/SIG-121

Journal ref: Foundations and Trends in Signal Processing, vol. 19, no. 1, pp. 1-98, 2025

arXiv:2501.05557 [pdf, other]

Mel-Spectrogram Inversion via Alternating Direction Method of Multipliers

Authors: Yoshiki Masuyama, Natsuki Ueno, Nobutaka Ono

Abstract: Signal reconstruction from its mel-spectrogram is known as mel-spectrogram inversion and has many applications, including speech and foley sound synthesis. In this paper, we propose a mel-spectrogram inversion method based on a rigorous optimization algorithm. To reconstruct a time-domain signal with inverse short-time Fourier transform (STFT), both full-band STFT magnitude and phase should be pre… ▽ More Signal reconstruction from its mel-spectrogram is known as mel-spectrogram inversion and has many applications, including speech and foley sound synthesis. In this paper, we propose a mel-spectrogram inversion method based on a rigorous optimization algorithm. To reconstruct a time-domain signal with inverse short-time Fourier transform (STFT), both full-band STFT magnitude and phase should be predicted from a given mel-spectrogram. Their joint estimation has outperformed the cascaded full-band magnitude prediction and phase reconstruction by preventing error accumulation. However, the existing joint estimation method requires many iterations, and there remains room for performance improvement. We present an alternating direction method of multipliers (ADMM)-based joint estimation method motivated by its success in various nonconvex optimization problems including phase reconstruction. An efficient update of each variable is derived by exploiting the conditional independence among the variables. Our experiments demonstrate the effectiveness of the proposed method on speech and foley sounds. △ Less

Submitted 13 January, 2025; v1 submitted 9 January, 2025; originally announced January 2025.

Comments: Accepted to ICASSP 2025

arXiv:2408.14731 [pdf, other]

Physics-Informed Machine Learning For Sound Field Estimation

Authors: Shoichi Koyama, Juliano G. C. Ribeiro, Tomohiko Nakamura, Natsuki Ueno, Mirco Pezzoli

Abstract: The area of study concerning the estimation of spatial sound, i.e., the distribution of a physical quantity of sound such as acoustic pressure, is called sound field estimation, which is the basis for various applied technologies related to spatial audio processing. The sound field estimation problem is formulated as a function interpolation problem in machine learning in a simplified scenario. Ho… ▽ More The area of study concerning the estimation of spatial sound, i.e., the distribution of a physical quantity of sound such as acoustic pressure, is called sound field estimation, which is the basis for various applied technologies related to spatial audio processing. The sound field estimation problem is formulated as a function interpolation problem in machine learning in a simplified scenario. However, high estimation performance cannot be expected by simply applying general interpolation techniques that rely only on data. The physical properties of sound fields are useful a priori information, and it is considered extremely important to incorporate them into the estimation. In this article, we introduce the fundamentals of physics-informed machine learning (PIML) for sound field estimation and overview current PIML-based sound field estimation methods. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: Accepted to IEEE Signal Processing Magazine, Special Issue on Model-based and Data-Driven Audio Signal Processing

arXiv:2307.12232 [pdf, other]

Signal Reconstruction from Mel-spectrogram Based on Bi-level Consistency of Full-band Magnitude and Phase

Authors: Yoshiki Masuyama, Natsuki Ueno, Nobutaka Ono

Abstract: We propose an optimization-based method for reconstructing a time-domain signal from a low-dimensional spectral representation such as a mel-spectrogram. Phase reconstruction has been studied to reconstruct a time-domain signal from the full-band short-time Fourier transform (STFT) magnitude. The Griffin-Lim algorithm (GLA) has been widely used because it relies only on the redundancy of STFT and… ▽ More We propose an optimization-based method for reconstructing a time-domain signal from a low-dimensional spectral representation such as a mel-spectrogram. Phase reconstruction has been studied to reconstruct a time-domain signal from the full-band short-time Fourier transform (STFT) magnitude. The Griffin-Lim algorithm (GLA) has been widely used because it relies only on the redundancy of STFT and is applicable to various audio signals. In this paper, we jointly reconstruct the full-band magnitude and phase by considering the bi-level relationships among the time-domain signal, its STFT coefficients, and its mel-spectrogram. The proposed method is formulated as a rigorous optimization problem and estimates the full-band magnitude based on the criterion used in GLA. Our experiments demonstrate the effectiveness of the proposed method on speech, music, and environmental signals. △ Less

Submitted 23 July, 2023; originally announced July 2023.

Comments: Accepted to IEEE WASPAA 2023

arXiv:2303.13027 [pdf, other]

Weighted Pressure and Mode Matching for Sound Field Reproduction: Theoretical and Experimental Comparisons

Authors: Shoichi Koyama, Keisuke Kimura, Natsuki Ueno

Abstract: Two sound field reproduction methods, weighted pressure matching and weighted mode matching, are theoretically and experimentally compared. The weighted pressure and mode matching are a generalization of conventional pressure and mode matching, respectively. Both methods are derived by introducing a weighting matrix in the pressure and mode matching. The weighting matrix in the weighted pressure m… ▽ More Two sound field reproduction methods, weighted pressure matching and weighted mode matching, are theoretically and experimentally compared. The weighted pressure and mode matching are a generalization of conventional pressure and mode matching, respectively. Both methods are derived by introducing a weighting matrix in the pressure and mode matching. The weighting matrix in the weighted pressure matching is defined on the basis of the kernel interpolation of the sound field from pressure at a discrete set of control points. In the weighted mode matching, the weighting matrix is defined by a regional integration of spherical wavefunctions. It is theoretically shown that the weighted pressure matching is a special case of the weighted mode matching by infinite-dimensional harmonic analysis for estimating expansion coefficients from pressure observations. The difference between the two methods are discussed through experiments. △ Less

Submitted 23 March, 2023; originally announced March 2023.

Comments: Accepted to Journal of Audio Engineering Society, Special Issue on Spatial Audio

arXiv:2112.06774 [pdf, ps, other]

Mean-square-error-based secondary source placement in sound field synthesis with prior information on desired field

Authors: Keisuke Kimura, Shoichi Koyama, Natsuki Ueno, Hiroshi Saruwatari

Abstract: A method of optimizing secondary source placement in sound field synthesis is proposed. Such an optimization method will be useful when the allowable placement region and available number of loudspeakers are limited. We formulate a mean-square-error-based cost function, incorporating the statistical properties of possible desired sound fields, for general linear-least-squares-based sound field syn… ▽ More A method of optimizing secondary source placement in sound field synthesis is proposed. Such an optimization method will be useful when the allowable placement region and available number of loudspeakers are limited. We formulate a mean-square-error-based cost function, incorporating the statistical properties of possible desired sound fields, for general linear-least-squares-based sound field synthesis methods, including pressure matching and (weighted) mode matching, whereas most of the current methods are applicable only to the pressure-matching method. An efficient greedy algorithm for minimizing the proposed cost function is also derived. Numerical experiments indicated that a high reproduction accuracy can be achieved by the placement optimized by the proposed method compared with the empirically used regular placement. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2021

arXiv:2111.11045 [pdf, other]

Sound Field Reproduction With Weighted Mode Matching and Infinite-Dimensional Harmonic Analysis: An Experimental Evaluation

Authors: Shoichi Koyama, Keisuke Kimura, Natsuki Ueno

Abstract: Sound field reproduction methods based on numerical optimization, which aim to minimize the error between synthesized and desired sound fields, are useful in many practical scenarios because of their flexibility in the array geometry of loudspeakers. However, the reproduction performance of these methods in a practical environment has not been sufficiently investigated. We evaluate weighted mode m… ▽ More Sound field reproduction methods based on numerical optimization, which aim to minimize the error between synthesized and desired sound fields, are useful in many practical scenarios because of their flexibility in the array geometry of loudspeakers. However, the reproduction performance of these methods in a practical environment has not been sufficiently investigated. We evaluate weighted mode matching, which is a sound field reproduction method based on the spherical wavefunction expansion of the sound field, in comparison with conventional pressure matching. We also introduce a method of infinite-dimensional harmonic analysis for estimating the expansion coefficients of the sound field from microphone measurements. Experimental results indicated that weighted mode matching using the expansion coefficients of the transfer functions estimated by the infinite-dimensional harmonic analysis outperforms conventional pressure matching, especially when the number of microphones is small. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: Accepted to International Conference on Immersive and 3D Audio (I3DA) 2021

arXiv:2110.04972 [pdf, ps, other]

Kernel Learning For Sound Field Estimation With L1 and L2 Regularizations

Authors: Ryosuke Horiuchi, Shoichi Koyama, Juliano G. C. Ribeiro, Natsuki Ueno, Hiroshi Saruwatari

Abstract: A method to estimate an acoustic field from discrete microphone measurements is proposed. A kernel-interpolation-based method using the kernel function formulated for sound field interpolation has been used in various applications. The kernel function with directional weighting makes it possible to incorporate prior information on source directions to improve estimation accuracy. However, in prior… ▽ More A method to estimate an acoustic field from discrete microphone measurements is proposed. A kernel-interpolation-based method using the kernel function formulated for sound field interpolation has been used in various applications. The kernel function with directional weighting makes it possible to incorporate prior information on source directions to improve estimation accuracy. However, in prior studies, parameters for directional weighting have been empirically determined. We propose a method to optimize these parameters using observation values, which is particularly useful when prior information on source directions is uncertain. The proposed algorithm is based on discretization of the parameters and representation of the kernel function as a weighted sum of sub-kernels. Two types of regularization for the weights, $L_1$ and $L_2$, are investigated. Experimental results indicate that the proposed method achieves higher estimation accuracy than the method without kernel learning. △ Less

Submitted 12 October, 2021; v1 submitted 10 October, 2021; originally announced October 2021.

Comments: Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2021

arXiv:2106.10801 [pdf, other]

MeshRIR: A Dataset of Room Impulse Responses on Meshed Grid Points For Evaluating Sound Field Analysis and Synthesis Methods

Authors: Shoichi Koyama, Tomoya Nishida, Keisuke Kimura, Takumi Abe, Natsuki Ueno, Jesper Brunnström

Abstract: A new impulse response (IR) dataset called "MeshRIR" is introduced. Currently available datasets usually include IRs at an array of microphones from several source positions under various room conditions, which are basically designed for evaluating speech enhancement and distant speech recognition methods. On the other hand, methods of estimating or controlling spatial sound fields have been exten… ▽ More A new impulse response (IR) dataset called "MeshRIR" is introduced. Currently available datasets usually include IRs at an array of microphones from several source positions under various room conditions, which are basically designed for evaluating speech enhancement and distant speech recognition methods. On the other hand, methods of estimating or controlling spatial sound fields have been extensively investigated in recent years; however, the current IR datasets are not applicable to validating and comparing these methods because of the low spatial resolution of measurement points. MeshRIR consists of IRs measured at positions obtained by finely discretizing a spatial region. Two subdatasets are currently available: one consists of IRs in a three-dimensional cuboidal region from a single source, and the other consists of IRs in a two-dimensional square region from an array of 32 sources. Therefore, MeshRIR is suitable for evaluating sound field analysis and synthesis methods. This dataset is freely available at https://sh01k.github.io/MeshRIR/ with some codes of sample applications. △ Less

Submitted 23 July, 2021; v1 submitted 20 June, 2021; originally announced June 2021.

Comments: Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2021

Showing 1–9 of 9 results for author: Ueno, N