Search | arXiv e-print repository

DRED: Deep REDundancy Coding of Speech Using a Rate-Distortion-Optimized Variational Autoencoder

Authors: Jean-Marc Valin, Jan Büthe, Ahmed Mustafa, Michael Klingbeil

Abstract: Despite recent advancements in packet loss concealment (PLC) using deep learning techniques, packet loss remains a significant challenge in real-time speech communication. Redundancy has been used in the past to recover the missing information during losses. However, conventional redundancy techniques are limited in the maximum loss duration they can cover and are often unsuitable for burst packet… ▽ More Despite recent advancements in packet loss concealment (PLC) using deep learning techniques, packet loss remains a significant challenge in real-time speech communication. Redundancy has been used in the past to recover the missing information during losses. However, conventional redundancy techniques are limited in the maximum loss duration they can cover and are often unsuitable for burst packet loss. We propose a new approach based on a rate-distortion-optimized variational autoencoder (RDO-VAE), allowing us to optimize a deep speech compression algorithm for the task of encoding large amounts of redundancy at very low bitrate. The proposed Deep REDundancy (DRED) algorithm can transmit up to 50x redundancy using less than 32 kb/s. Results show that DRED outperforms the existing Opus codec redundancy. We also demonstrate its benefits when operating in the context of WebRTC. △ Less

Submitted 24 October, 2024; v1 submitted 8 December, 2022; originally announced December 2022.

Comments: Accepted in IEEE Journal of Selected Topics in Signal Processing, 7 pages

arXiv:2205.05785 [pdf, other]

Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model

Authors: Jean-Marc Valin, Ahmed Mustafa, Christopher Montgomery, Timothy B. Terriberry, Michael Klingbeil, Paris Smaragdis, Arvindh Krishnaswamy

Abstract: As deep speech enhancement algorithms have recently demonstrated capabilities greatly surpassing their traditional counterparts for suppressing noise, reverberation and echo, attention is turning to the problem of packet loss concealment (PLC). PLC is a challenging task because it not only involves real-time speech synthesis, but also frequent transitions between the received audio and the synthes… ▽ More As deep speech enhancement algorithms have recently demonstrated capabilities greatly surpassing their traditional counterparts for suppressing noise, reverberation and echo, attention is turning to the problem of packet loss concealment (PLC). PLC is a challenging task because it not only involves real-time speech synthesis, but also frequent transitions between the received audio and the synthesized concealment. We propose a hybrid neural PLC architecture where the missing speech is synthesized using a generative model conditioned using a predictive model. The resulting algorithm achieves natural concealment that surpasses the quality of existing conventional PLC algorithms and ranked second in the Interspeech 2022 PLC Challenge. We show that our solution not only works for uncompressed audio, but is also applicable to a modern speech codec. △ Less

Submitted 11 May, 2022; originally announced May 2022.

Comments: Submitted to INTERSPEECH 2022

arXiv:1104.2208 [pdf, other]

doi 10.1103/PhysRevB.84.155325

Gain in Three-Dimensional Metamaterials utilizing Semiconductor Quantum Structures

Authors: Stephan Schwaiger, Matthias Klingbeil, Jochen Kerbst, Andreas Rottler, Ricardo Costa, Aune Koitmäe, Markus Bröll, Christian Heyn, Yuliya Stark, Detlef Heitmann, Stefan Mendach

Abstract: We demonstrate gain in a three-dimensional metal/semiconductor metamaterial by the integration of optically active semiconductor quantum structures. The rolling-up of a metallic structure on top of strained semiconductor layers containing a quantum well allows us to achieve a three-dimensional superlattice consisting of alternating layers of lossy metallic and amplifying gain material. We show tha… ▽ More We demonstrate gain in a three-dimensional metal/semiconductor metamaterial by the integration of optically active semiconductor quantum structures. The rolling-up of a metallic structure on top of strained semiconductor layers containing a quantum well allows us to achieve a three-dimensional superlattice consisting of alternating layers of lossy metallic and amplifying gain material. We show that the transmission through the superlattice can be enhanced by exciting the quantum well optically under both pulsed or continuous wave excitation. This points out that our structures can be used as a starting point for arbitrary three-dimensional metamaterials including gain. △ Less

Submitted 12 April, 2011; originally announced April 2011.

Showing 1–3 of 3 results for author: Klingbeil, M