Skip to main content

Showing 1–11 of 11 results for author: Fuchs, G

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.01731  [pdf, other

    eess.AS

    Benchmarking Neural Speech Codec Intelligibility with SITool

    Authors: Anna Leschanowsky, Kishor Kayyar Lakshminarayana, Anjana Rajasekhar, Lyonel Behringer, Ibrahim Kilinc, Guillaume Fuchs, Emanuël A. P. Habets

    Abstract: Speech intelligibility assessment is essential for evaluating neural speech codecs, yet most evaluation efforts focus on overall quality rather than intelligibility. Only a few publicly available tools exist for conducting standardized intelligibility tests, like the Diagnostic Rhyme Test (DRT) and Modified Rhyme Test (MRT). We introduce the Speech Intelligibility Toolkit for Subjective Evaluation… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: submitted to Interspeech

  2. arXiv:2505.16404  [pdf, ps, other

    eess.AS cs.SD

    UBGAN: Enhancing Coded Speech with Blind and Guided Bandwidth Extension

    Authors: Kishan Gupta, Srikanth Korse, Andreas Brendel, Nicola Pia, Guillaume Fuchs

    Abstract: In practical application of speech codecs, a multitude of factors such as the quality of the radio connection, limiting hardware or required user experience necessitate trade-offs between achievable perceptual quality, engendered bitrate and computational complexity. Most conventional and neural speech codecs operate on wideband (WB) speech signals to achieve this compromise. To further enhance th… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  3. A first-order DirAC-based parametric Ambisonic coder for immersive communications

    Authors: Guillaume Fuchs, Florin Ghido, Dominik Weckbecker, Oliver Thiergart

    Abstract: Directional Audio Coding (DirAC) is a proven method for parametrically representing a 3D audio scene in B-format and is capable of reproducing it on arbitrary loudspeaker layouts. Although such a method seems well suited for low bitrate Ambisonic transmission, little work has been done on the feasibility of building a real system upon it. In this paper, we present a DirAC-based coding for Higher-O… ▽ More

    Submitted 30 March, 2025; originally announced March 2025.

    Comments: Accepted at ICASSP'25

  4. arXiv:2406.08900  [pdf, other

    eess.AS cs.SD eess.SP

    On Improving Error Resilience of Neural End-to-End Speech Coders

    Authors: Kishan Gupta, Nicola Pia, Srikanth Korse, Andreas Brendel, Guillaume Fuchs, Markus Multrus

    Abstract: Error resilient tools like Packet Loss Concealment (PLC) and Forward Error Correction (FEC) are essential to maintain a reliable speech communication for applications like Voice over Internet Protocol (VoIP), where packets are frequently delayed and lost. In recent times, end-to-end neural speech codecs have seen a significant rise, due to their ability to transmit speech signal at low bitrates bu… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  5. arXiv:2405.08417  [pdf, other

    eess.AS cs.SD

    Neural Speech Coding for Real-time Communications using Constant Bitrate Scalar Quantization

    Authors: Andreas Brendel, Nicola Pia, Kishan Gupta, Lyonel Behringer, Guillaume Fuchs, Markus Multrus

    Abstract: Neural audio coding has emerged as a vivid research direction by promising good audio quality at very low bitrates unachievable by classical coding techniques. Here, end-to-end trainable autoencoder-like models represent the state of the art, where a discrete representation in the bottleneck of the autoencoder is learned. This allows for efficient transmission of the input audio signal. The learne… ▽ More

    Submitted 19 September, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

  6. arXiv:2207.03282  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    NESC: Robust Neural End-2-End Speech Coding with GANs

    Authors: Nicola Pia, Kishan Gupta, Srikanth Korse, Markus Multrus, Guillaume Fuchs

    Abstract: Neural networks have proven to be a formidable tool to tackle the problem of speech coding at very low bit rates. However, the design of a neural coder that can be operated robustly under real-world conditions remains a major challenge. Therefore, we present Neural End-2-End Speech Codec (NESC) a robust, scalable end-to-end neural speech codec for high-quality wideband speech coding at 3 kbps. The… ▽ More

    Submitted 7 July, 2022; originally announced July 2022.

    Comments: Paper accepted to Interspeech 2022 Please check our demo at: https://fhgspco.github.io/nesc/

  7. arXiv:2201.13093  [pdf, other

    eess.AS cs.SD

    PostGAN: A GAN-Based Post-Processor to Enhance the Quality of Coded Speech

    Authors: Srikanth Korse, Nicola Pia, Kishan Gupta, Guillaume Fuchs

    Abstract: The quality of speech coded by transform coding is affected by various artefacts especially when bitrates to quantize the frequency components become too low. In order to mitigate these coding artefacts and enhance the quality of coded speech, a post-processor that relies on a-priori information transmitted from the encoder is traditionally employed at the decoder side. In recent years, several da… ▽ More

    Submitted 31 January, 2022; originally announced January 2022.

    Comments: Accepted to ICASSP 2022

  8. arXiv:2201.12039  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    A DNN Based Post-Filter to Enhance the Quality of Coded Speech in MDCT Domain

    Authors: Kishan Gupta, Srikanth Korse, Bernd Edler, Guillaume Fuchs

    Abstract: Frequency domain processing, and in particular the use of Modified Discrete Cosine Transform (MDCT), is the most widespread approach to audio coding. However, at low bitrates, audio quality, especially for speech, degrades drastically due to the lack of available bits to directly code the transform coefficients. Traditionally, post-filtering has been used to mitigate artefacts in the coded speech… ▽ More

    Submitted 28 January, 2022; originally announced January 2022.

  9. arXiv:2108.04051  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate

    Authors: Ahmed Mustafa, Jan Büthe, Srikanth Korse, Kishan Gupta, Guillaume Fuchs, Nicola Pia

    Abstract: Recently, GAN vocoders have seen rapid progress in speech synthesis, starting to outperform autoregressive models in perceptual quality with much higher generation speed. However, autoregressive vocoders are still the common choice for neural generation of speech signals coded at very low bit rates. In this paper, we present a GAN vocoder which is able to generate wideband speech waveforms from pa… ▽ More

    Submitted 9 August, 2021; originally announced August 2021.

    Comments: Accepted to the 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2021)

  10. arXiv:2011.01557  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization

    Authors: Ahmed Mustafa, Nicola Pia, Guillaume Fuchs

    Abstract: In recent years, neural vocoders have surpassed classical speech generation approaches in naturalness and perceptual quality of the synthesized speech. Computationally heavy models like WaveNet and WaveGlow achieve best results, while lightweight GAN models, e.g. MelGAN and Parallel WaveGAN, remain inferior in terms of perceptual quality. We therefore propose StyleMelGAN, a lightweight neural voco… ▽ More

    Submitted 12 February, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: Accepted to ICASSP2021

  11. Enhancement Of Coded Speech Using a Mask-Based Post-Filter

    Authors: Srikanth Korse, Kishan Gupta, Guillaume Fuchs

    Abstract: The quality of speech codecs deteriorates at low bitrates due to high quantization noise. A post-filter is generally employed to enhance the quality of the coded speech. In this paper, a data-driven post-filter relying on masking in the time-frequency domain is proposed. A fully connected neural network (FCNN), a convolutional encoder-decoder (CED) network and a long short-term memory (LSTM) netwo… ▽ More

    Submitted 12 October, 2020; originally announced October 2020.

    Comments: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)