Skip to main content

Showing 1–19 of 19 results for author: Reddy, C K A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.04890  [pdf, ps, other

    eess.AS

    Multivariate Probabilistic Assessment of Speech Quality

    Authors: Fredrik Cumlin, Xinyu Liang, Victor Ungureanu, Chandan K. A. Reddy, Christian Schüldt, Saikat Chatterjee

    Abstract: The mean opinion score (MOS) is a standard metric for assessing speech quality, but its singular focus fails to identify specific distortions when low scores are observed. The NISQA dataset addresses this limitation by providing ratings across four additional dimensions: noisiness, coloration, discontinuity, and loudness, alongside MOS. In this paper, we extend the explored univariate MOS estimati… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Accepted at Interspeech 2025

  2. arXiv:2504.21528  [pdf, ps, other

    eess.AS

    Impairments are Clustered in Latents of Deep Neural Network-based Speech Quality Models

    Authors: Fredrik Cumlin, Xinyu Liang, Victor Ungureanu, Chandan K. A. Reddy, Christian Schüldt, Saikat Chatterjee

    Abstract: In this article, we provide an experimental observation: Deep neural network (DNN) based speech quality assessment (SQA) models have inherent latent representations where many types of impairments are clustered. While DNN-based SQA models are not trained for impairment classification, our experiments show good impairment classification results in an appropriate SQA latent representation. We invest… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  3. arXiv:2409.18239  [pdf, other

    cs.SD cs.LG eess.AS

    Towards Sub-millisecond Latency Real-Time Speech Enhancement Models on Hearables

    Authors: Artem Dementyev, Chandan K. A. Reddy, Scott Wisdom, Navin Chatlani, John R. Hershey, Richard F. Lyon

    Abstract: Low latency models are critical for real-time speech enhancement applications, such as hearing aids and hearables. However, the sub-millisecond latency space for resource-constrained hearables remains underexplored. We demonstrate speech enhancement using a computationally efficient minimum-phase FIR filter, enabling sample-by-sample processing to achieve mean algorithmic latency of 0.32 ms to 1.2… ▽ More

    Submitted 7 March, 2025; v1 submitted 26 September, 2024; originally announced September 2024.

  4. arXiv:2209.06358  [pdf, other

    cs.SD cs.LG eess.AS

    Using Rater and System Metadata to Explain Variance in the VoiceMOS Challenge 2022 Dataset

    Authors: Michael Chinen, Jan Skoglund, Chandan K A Reddy, Alessandro Ragano, Andrew Hines

    Abstract: Non-reference speech quality models are important for a growing number of applications. The VoiceMOS 2022 challenge provided a dataset of synthetic voice conversion and text-to-speech samples with subjective labels. This study looks at the amount of variance that can be explained in subjective ratings of speech quality from metadata and the distribution imbalances of the dataset. Speech quality mo… ▽ More

    Submitted 13 September, 2022; originally announced September 2022.

    Comments: Preprint; accepted for Interspeech 2022

  5. arXiv:2204.02249  [pdf, other

    eess.AS

    A Comparison of Deep Learning MOS Predictors for Speech Synthesis Quality

    Authors: Alessandro Ragano, Emmanouil Benetos, Michael Chinen, Helard B. Martinez, Chandan K. A. Reddy, Jan Skoglund, Andrew Hines

    Abstract: Speech synthesis quality prediction has made remarkable progress with the development of supervised and self-supervised learning (SSL) MOS predictors but some aspects related to the data are still unclear and require further study. In this paper, we evaluate several MOS predictors based on wav2vec 2.0 and the NISQA speech quality prediction model to explore the role of the training data, the influ… ▽ More

    Submitted 24 November, 2023; v1 submitted 5 April, 2022; originally announced April 2022.

    Comments: Accepted ISSC 2023

  6. arXiv:2110.04331  [pdf, ps, other

    eess.AS cs.SD

    MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection

    Authors: Chandan K. A. Reddy, Vishak Gopa, Harishchandra Dubey, Sergiy Matusevych, Ross Cutler, Robert Aichner

    Abstract: With the recent growth of remote work, online meetings often encounter challenging audio contexts such as background noise, music, and echo. Accurate real-time detection of music events can help to improve the user experience. In this paper, we present MusicNet, a compact neural model for detecting background music in the real-time communications pipeline. In video meetings, music frequently co-oc… ▽ More

    Submitted 15 April, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

  7. arXiv:2110.01763  [pdf, other

    eess.AS cs.SD

    DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors

    Authors: Chandan K A Reddy, Vishak Gopal, Ross Cutler

    Abstract: Human subjective evaluation is the gold standard to evaluate speech quality optimized for human perception. Perceptual objective metrics serve as a proxy for subjective scores. We have recently developed a non-intrusive speech quality metric called Deep Noise Suppression Mean Opinion Score (DNSMOS) using the scores from ITU-T Rec. P.808 subjective evaluation. The P.808 scores reflect the overall q… ▽ More

    Submitted 4 February, 2022; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2010.15258

  8. arXiv:2101.09249  [pdf, other

    eess.AS cs.SD

    Towards efficient models for real-time deep noise suppression

    Authors: Sebastian Braun, Hannes Gamper, Chandan K. A. Reddy, Ivan Tashev

    Abstract: With recent research advancements, deep learning models are becoming attractive and powerful choices for speech enhancement in real-time applications. While state-of-the-art models can achieve outstanding results in terms of speech quality and background noise reduction, the main challenge is to obtain compact enough models, which are resource efficient during inference time. An important but ofte… ▽ More

    Submitted 19 May, 2021; v1 submitted 22 January, 2021; originally announced January 2021.

  9. arXiv:2101.01902  [pdf, other

    cs.SD cs.LG eess.AS

    Interspeech 2021 Deep Noise Suppression Challenge

    Authors: Chandan K A Reddy, Harishchandra Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan

    Abstract: The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. We recently organized a DNS challenge special session at INTERSPEECH and ICASSP 2020. We open-sourced training and test datasets for the wideband scenario. We also open-sourced a subjective evaluation framework based on ITU-T standard P.808, wh… ▽ More

    Submitted 4 April, 2021; v1 submitted 6 January, 2021; originally announced January 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2009.06122

  10. arXiv:2010.15258  [pdf, other

    cs.SD cs.LG eess.AS

    DNSMOS: A Non-Intrusive Perceptual Objective Speech Quality metric to evaluate Noise Suppressors

    Authors: Chandan K A Reddy, Vishak Gopal, Ross Cutler

    Abstract: Human subjective evaluation is the gold standard to evaluate speech quality optimized for human perception. Perceptual objective metrics serve as a proxy for subjective scores. The conventional and widely used metrics require a reference clean speech signal, which is unavailable in real recordings. The no-reference approaches correlate poorly with human ratings and are not widely adopted in the re… ▽ More

    Submitted 10 February, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

    Comments: Submitted to ICASSP 2020

  11. arXiv:2009.06122  [pdf, other

    eess.AS

    ICASSP 2021 Deep Noise Suppression Challenge

    Authors: Chandan K A Reddy, Harishchandra Dubey, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan

    Abstract: The Deep Noise Suppression (DNS) challenge is designed to foster innovation in the area of noise suppression to achieve superior perceptual speech quality. We recently organized a DNS challenge special session at INTERSPEECH 2020. We open sourced training and test datasets for researchers to train their noise suppression models. We also open sourced a subjective evaluation framework and used the t… ▽ More

    Submitted 26 October, 2020; v1 submitted 13 September, 2020; originally announced September 2020.

  12. arXiv:2005.13981  [pdf

    eess.AS cs.LG cs.SD

    The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Testing Framework, and Challenge Results

    Authors: Chandan K. A. Reddy, Vishak Gopal, Ross Cutler, Ebrahim Beyrami, Roger Cheng, Harishchandra Dubey, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke

    Abstract: The INTERSPEECH 2020 Deep Noise Suppression (DNS) Challenge is intended to promote collaborative research in real-time single-channel Speech Enhancement aimed to maximize the subjective (perceptual) quality of the enhanced speech. A typical approach to evaluate the noise suppression methods is to use objective metrics on the test set obtained by splitting the original dataset. While the performanc… ▽ More

    Submitted 18 October, 2020; v1 submitted 16 May, 2020; originally announced May 2020.

    Comments: Interspeech 2020. arXiv admin note: substantial text overlap with arXiv:2001.08662

  13. arXiv:2001.10601  [pdf, other

    eess.AS cs.SD

    Weighted Speech Distortion Losses for Neural-network-based Real-time Speech Enhancement

    Authors: Yangyang Xia, Sebastian Braun, Chandan K. A. Reddy, Harishchandra Dubey, Ross Cutler, Ivan Tashev

    Abstract: This paper investigates several aspects of training a RNN (recurrent neural network) that impact the objective and subjective quality of enhanced speech for real-time single-channel speech enhancement. Specifically, we focus on a RNN that enhances short-time speech spectra on a single-frame-in, single-frame-out basis, a framework adopted by most classical signal processing methods. We propose two… ▽ More

    Submitted 12 February, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

  14. arXiv:2001.09571  [pdf

    eess.AS

    Noise dependent Super Gaussian-Coherence based dual microphone Speech Enhancement for hearing aid application using smartphone

    Authors: Nikhil Shankar, Gautam S Bhat, Chandan K A Reddy, Issa Panahi

    Abstract: In this paper, the coherence between speech and noise signals is used to obtain a Speech Enhancement (SE) gain function, in combination with a Super Gaussian Joint Maximum a Posteriori (SGJMAP) single microphone SE gain function. The proposed SE method can be implemented on a smartphone that works as an assistive device to hearing aids. Although coherence SE gain function suppresses the background… ▽ More

    Submitted 26 January, 2020; originally announced January 2020.

    Comments: 4 pages, 3 figures

  15. arXiv:2001.08662  [pdf

    cs.SD cs.LG eess.AS

    The INTERSPEECH 2020 Deep Noise Suppression Challenge: Datasets, Subjective Speech Quality and Testing Framework

    Authors: Chandan K. A. Reddy, Ebrahim Beyrami, Harishchandra Dubey, Vishak Gopal, Roger Cheng, Ross Cutler, Sergiy Matusevych, Robert Aichner, Ashkan Aazami, Sebastian Braun, Puneet Rana, Sriram Srinivasan, Johannes Gehrke

    Abstract: The INTERSPEECH 2020 Deep Noise Suppression Challenge is intended to promote collaborative research in real-time single-channel Speech Enhancement aimed to maximize the subjective (perceptual) quality of the enhanced speech. A typical approach to evaluate the noise suppression methods is to use objective metrics on the test set obtained by splitting the original dataset. Many publications report r… ▽ More

    Submitted 19 April, 2020; v1 submitted 23 January, 2020; originally announced January 2020.

    Comments: Details about Deep Noise Suppression Challenge

  16. arXiv:1909.08050  [pdf

    cs.SD cs.LG eess.AS

    A scalable noisy speech dataset and online subjective test framework

    Authors: Chandan K. A. Reddy, Ebrahim Beyrami, Jamie Pool, Ross Cutler, Sriram Srinivasan, Johannes Gehrke

    Abstract: Background noise is a major source of quality impairments in Voice over Internet Protocol (VoIP) and Public Switched Telephone Network (PSTN) calls. Recent work shows the efficacy of deep learning for noise suppression, but the datasets have been relatively small compared to those used in other domains (e.g., ImageNet) and the associated evaluations have been more focused. In order to better facil… ▽ More

    Submitted 17 September, 2019; originally announced September 2019.

    Comments: InterSpeech 2019

  17. arXiv:1907.01742  [pdf

    cs.SD cs.LG eess.AS

    Supervised Classifiers for Audio Impairments with Noisy Labels

    Authors: Chandan K A Reddy, Ross Cutler, Johannes Gehrke

    Abstract: Voice-over-Internet-Protocol (VoIP) calls are prone to various speech impairments due to environmental and network conditions resulting in bad user experience. A reliable audio impairment classifier helps to identify the cause for bad audio quality. The user feedback after the call can act as the ground truth labels for training a supervised classifier on a large audio dataset. However, the labels… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Comments: To appear in INTERSPEECH 2019

  18. An individualized super Gaussian single microphone Speech Enhancement for hearing aid users with smartphone as an assistive device

    Authors: Chandan K A Reddy, Nikhil Shankar, Gautam Bhat, Ram Charan, Issa Panahi

    Abstract: In this letter, we derive a new super Gaussian Joint Maximum a Posteriori based single microphone speech enhancement gain function. The developed Speech Enhancement method is implemented on a smartphone, and this arrangement functions as an assistive device to hearing aids. We introduce a tradeoff parameter in the derived gain function that allows the smartphone user to customize their listening p… ▽ More

    Submitted 10 December, 2018; originally announced December 2018.

    Comments: 5 pages

  19. arXiv:1812.03914  [pdf

    cs.SD eess.AS

    A Computationally Efficient and Practically Feasible Two Microphones Blind Speech Separation Method

    Authors: Chandan K A Reddy, Gautam Bhat, Nikhil Shankar, Issa Panahi

    Abstract: Traditionally, Blind Speech Separation techniques are computationally expensive as they update the demixing matrix at every time frame index, making them impractical to use in many Real-Time applications. In this paper, a robust data-driven two-microphone sound source localization method is used as a criterion to reduce the computational complexity of the Independent Vector Analysis (IVA) Blind Sp… ▽ More

    Submitted 10 December, 2018; originally announced December 2018.

    Comments: 5 pages