Search | arXiv e-print repository

doi 10.1109/ICASSP49660.2025.10888497

A first-order DirAC-based parametric Ambisonic coder for immersive communications

Authors: Guillaume Fuchs, Florin Ghido, Dominik Weckbecker, Oliver Thiergart

Abstract: Directional Audio Coding (DirAC) is a proven method for parametrically representing a 3D audio scene in B-format and is capable of reproducing it on arbitrary loudspeaker layouts. Although such a method seems well suited for low bitrate Ambisonic transmission, little work has been done on the feasibility of building a real system upon it. In this paper, we present a DirAC-based coding for Higher-O… ▽ More Directional Audio Coding (DirAC) is a proven method for parametrically representing a 3D audio scene in B-format and is capable of reproducing it on arbitrary loudspeaker layouts. Although such a method seems well suited for low bitrate Ambisonic transmission, little work has been done on the feasibility of building a real system upon it. In this paper, we present a DirAC-based coding for Higher-Order Ambisonics (HOA), developed as part of a standardisation effort to extend the 3GPP EVS codec to immersive communications. Starting from the first-order DirAC model, we show how to reduce algorithmic delay, the bitrate required for the parameters and complexity by bringing the full synthesis in the spherical harmonic domain. The evaluation of the proposed technique for coding 3\textsuperscript{rd} order Ambisonics at bitrates from 32 to 128 kbps shows the relevance of the parametric approach compared with existing solutions. △ Less

Submitted 30 March, 2025; originally announced March 2025.

Comments: Accepted at ICASSP'25

arXiv:2409.13502 [pdf, other]

Neural Directional Filtering: Far-Field Directivity Control With a Small Microphone Array

Authors: Julian Wechsler, Srikanth Raj Chetupalli, Mhd Modar Halimeh, Oliver Thiergart, Emanuël A. P. Habets

Abstract: Capturing audio signals with specific directivity patterns is essential in speech communication. This study presents a deep neural network (DNN)-based approach to directional filtering, alleviating the need for explicit signal models. More specifically, our proposed method uses a DNN to estimate a single-channel complex mask from the signals of a microphone array. This mask is then applied to a re… ▽ More Capturing audio signals with specific directivity patterns is essential in speech communication. This study presents a deep neural network (DNN)-based approach to directional filtering, alleviating the need for explicit signal models. More specifically, our proposed method uses a DNN to estimate a single-channel complex mask from the signals of a microphone array. This mask is then applied to a reference microphone to render a signal that exhibits a desired directivity pattern. We investigate the training dataset composition and its effect on the directivity realized by the DNN during inference. Using a relatively small DNN, the proposed method is found to approximate the desired directivity pattern closely. Additionally, it allows for the realization of higher-order directivity patterns using a small number of microphones, which is a difficult task for linear and parametric directional filtering. △ Less

Submitted 20 September, 2024; originally announced September 2024.

Comments: Presented at the International Workshop on Acoustic Signal Enhancement (IWAENC), 2024

arXiv:2312.08132 [pdf, ps, other]

doi 10.1109/ICASSP48485.2024.10448353

Ultra Low Complexity Deep Learning Based Noise Suppression

Authors: Shrishti Saha Shetu, Soumitro Chakrabarty, Oliver Thiergart, Edwin Mabande

Abstract: This paper introduces an innovative method for reducing the computational complexity of deep neural networks in real-time speech enhancement on resource-constrained devices. The proposed approach utilizes a two-stage processing framework, employing channelwise feature reorientation to reduce the computational load of convolutional operations. By combining this with a modified power law compression… ▽ More This paper introduces an innovative method for reducing the computational complexity of deep neural networks in real-time speech enhancement on resource-constrained devices. The proposed approach utilizes a two-stage processing framework, employing channelwise feature reorientation to reduce the computational load of convolutional operations. By combining this with a modified power law compression technique for enhanced perceptual quality, this approach achieves noise suppression performance comparable to state-of-the-art methods with significantly less computational requirements. Notably, our algorithm exhibits 3 to 4 times less computational complexity and memory usage than prior state-of-the-art approaches. △ Less

Submitted 13 December, 2023; originally announced December 2023.

Showing 1–3 of 3 results for author: Thiergart, O