Skip to main content

Showing 1–14 of 14 results for author: Manocha, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2310.09388  [pdf, other

    eess.AS cs.LG cs.SD

    CORN: Co-Trained Full- And No-Reference Speech Quality Assessment

    Authors: Pranay Manocha, Donald Williamson, Adam Finkelstein

    Abstract: Perceptual evaluation constitutes a crucial aspect of various audio-processing tasks. Full reference (FR) or similarity-based metrics rely on high-quality reference recordings, to which lower-quality or corrupted versions of the recording may be compared for evaluation. In contrast, no-reference (NR) metrics evaluate a recording without relying on a reference. Both the FR and NR approaches exhibit… ▽ More

    Submitted 8 January, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

  2. arXiv:2206.13411  [pdf, other

    eess.AS cs.SD

    Audio Similarity is Unreliable as a Proxy for Audio Quality

    Authors: Pranay Manocha, Zeyu Jin, Adam Finkelstein

    Abstract: Many audio processing tasks require perceptual assessment. However, the time and expense of obtaining ``gold standard'' human judgments limit the availability of such data. Most applications incorporate full reference or other similarity-based metrics (e.g. PESQ) that depend on a clean reference. Researchers have relied on such metrics to evaluate and compare various proposed methods, often conclu… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: To Appear, Interspeech 2022

  3. arXiv:2206.12297  [pdf, other

    eess.AS cs.SD

    SAQAM: Spatial Audio Quality Assessment Metric

    Authors: Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru, Vamsi K. Ithapu, Paul Calamia

    Abstract: Audio quality assessment is critical for assessing the perceptual realism of sounds. However, the time and expense of obtaining ''gold standard'' human judgments limit the availability of such data. For AR&VR, good perceived sound quality and localizability of sources are among the key elements to ensure complete immersion of the user. Our work introduces SAQAM which uses a multi-task learning fra… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: To Appear, Interspeech 2022

  4. arXiv:2206.12285  [pdf, other

    eess.AS cs.SD

    Speech Quality Assessment through MOS using Non-Matching References

    Authors: Pranay Manocha, Anurag Kumar

    Abstract: Human judgments obtained through Mean Opinion Scores (MOS) are the most reliable way to assess the quality of speech signals. However, several recent attempts to automatically estimate MOS using deep learning approaches lack robustness and generalization capabilities, limiting their use in real-world applications. In this work, we present a novel framework, NORESQA-MOS, for estimating the MOS of a… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: To Appear, Interspeech 2022

  5. arXiv:2205.02116  [pdf, other

    cs.CR cs.LG

    Optimizing One-pixel Black-box Adversarial Attacks

    Authors: Tianxun Zhou, Shubhankar Agrawal, Prateek Manocha

    Abstract: The output of Deep Neural Networks (DNN) can be altered by a small perturbation of the input in a black box setting by making multiple calls to the DNN. However, the high computation and time required makes the existing approaches unusable. This work seeks to improve the One-pixel (few-pixel) black-box adversarial attacks to reduce the number of calls to the network under attack. The One-pixel att… ▽ More

    Submitted 30 April, 2022; originally announced May 2022.

    Comments: 9 pasges, 4 figures

  6. arXiv:2203.03022  [pdf, ps, other

    cs.SD cs.AI cs.LG eess.AS stat.ML

    HEAR: Holistic Evaluation of Audio Representations

    Authors: Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk

    Abstract: What audio embedding approach generalizes best to a wide range of downstream tasks across a variety of everyday domains without fine-tuning? The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios. HEAR evaluates audio representations using a benchmark suite across a variety of domains, in… ▽ More

    Submitted 29 May, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

    Comments: to appear in Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track

    Journal ref: Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track

  7. arXiv:2109.08125  [pdf, other

    eess.AS cs.SD

    NORESQA: A Framework for Speech Quality Assessment using Non-Matching References

    Authors: Pranay Manocha, Buye Xu, Anurag Kumar

    Abstract: The perceptual task of speech quality assessment (SQA) is a challenging task for machines to do. Objective SQA methods that rely on the availability of the corresponding clean reference have been the primary go-to approaches for SQA. Clearly, these methods fail in real-world scenarios where the ground truth clean references are not available. In recent years, non-intrusive methods that train neura… ▽ More

    Submitted 18 October, 2021; v1 submitted 16 September, 2021; originally announced September 2021.

  8. DPLM: A Deep Perceptual Spatial-Audio Localization Metric

    Authors: Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru, Vamsi K. Ithapu, Paul Calamia

    Abstract: Subjective evaluations are critical for assessing the perceptual realism of sounds in audio-synthesis driven technologies like augmented and virtual reality. However, they are challenging to set up, fatiguing for users, and expensive. In this work, we tackle the problem of capturing the perceptual characteristics of localizing sounds. Specifically, we propose a framework for building a general pur… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

  9. arXiv:2102.05109  [pdf, other

    eess.AS cs.LG cs.SD

    CDPAM: Contrastive learning for perceptual audio similarity

    Authors: Pranay Manocha, Zeyu Jin, Richard Zhang, Adam Finkelstein

    Abstract: Many speech processing methods based on deep learning require an automatic and differentiable audio metric for the loss function. The DPAM approach of Manocha et al. learns a full-reference metric trained directly on human judgments, and thus correlates well with human perception. However, it requires a large number of human annotations and does not generalize well outside the range of perturbatio… ▽ More

    Submitted 9 February, 2021; originally announced February 2021.

    Comments: Dataset, code and sound examples can be found at https://github.com/pranaymanocha/PerceptualAudio/tree/master/cdpam

  10. arXiv:2011.01114  [pdf, other

    cs.CV cs.MM

    Facial Keypoint Sequence Generation from Audio

    Authors: Prateek Manocha, Prithwijit Guha

    Abstract: Whenever we speak, our voice is accompanied by facial movements and expressions. Several recent works have shown the synthesis of highly photo-realistic videos of talking faces, but they either require a source video to drive the target face or only generate videos with a fixed head pose. This lack of facial movement is because most of these works focus on the lip movement in sync with the audio w… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

  11. arXiv:2001.04460  [pdf, other

    eess.AS cs.SD

    A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences

    Authors: Pranay Manocha, Adam Finkelstein, Richard Zhang, Nicholas J. Bryan, Gautham J. Mysore, Zeyu Jin

    Abstract: Many audio processing tasks require perceptual assessment. The ``gold standard`` of obtaining human judgments is time-consuming, expensive, and cannot be used as an optimization criterion. On the other hand, automated metrics are efficient to compute but often correlate poorly with human judgment, particularly for audio differences at the threshold of human detection. In this work, we construct a… ▽ More

    Submitted 18 May, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

    Comments: Dataset, code and sound examples can be found at https://pixl.cs.princeton.edu/pubs/Manocha_2020_ADP/

  12. arXiv:1710.11309  [pdf

    cs.CV

    Tumor Classification and Segmentation of MR Brain Images

    Authors: Tanvi Gupta, Pranay Manocha, Tapan K. Gandhi, RK Gupta, BK Panigrahi

    Abstract: The diagnosis and segmentation of tumors using any medical diagnostic tool can be challenging due to the varying nature of this pathology. Magnetic Reso- nance Imaging (MRI) is an established diagnostic tool for various diseases and disorders and plays a major role in clinical neuro-diagnosis. Supplementing this technique with automated classification and segmentation tools is gaining importance,… ▽ More

    Submitted 30 October, 2017; originally announced October 2017.

  13. arXiv:1710.11121  [pdf

    cs.CV

    Automated Tumor Segmentation and Brain Mapping for the Tumor Area

    Authors: Pranay Manocha, Snehal Bhasme, Tanvi Gupta, BK Panigrahi, Tapan K. Gandhi

    Abstract: Magnetic Resonance Imaging (MRI) is an important diagnostic tool for precise detection of various pathologies. Magnetic Resonance (MR) is more preferred than Computed Tomography (CT) due to the high resolution in MR images which help in better detection of neurological conditions. Graphical user interface (GUI) aided disease detection has become increasingly useful due to the increasing workload o… ▽ More

    Submitted 28 October, 2017; originally announced October 2017.

  14. arXiv:1710.10974  [pdf, other

    cs.SD cs.IR eess.AS

    Content-based Representations of audio using Siamese neural networks

    Authors: Pranay Manocha, Rohan Badlani, Anurag Kumar, Ankit Shah, Benjamin Elizalde, Bhiksha Raj

    Abstract: In this paper, we focus on the problem of content-based retrieval for audio, which aims to retrieve all semantically similar audio recordings for a given audio clip query. This problem is similar to the problem of query by example of audio, which aims to retrieve media samples from a database, which are similar to the user-provided example. We propose a novel approach which encodes the audio into… ▽ More

    Submitted 15 February, 2018; v1 submitted 30 October, 2017; originally announced October 2017.