Skip to main content

Showing 1–5 of 5 results for author: Finkelstein, A

Searching in archive eess. Search in all archives.
.
  1. arXiv:2310.09388  [pdf, other

    eess.AS cs.LG cs.SD

    CORN: Co-Trained Full- And No-Reference Speech Quality Assessment

    Authors: Pranay Manocha, Donald Williamson, Adam Finkelstein

    Abstract: Perceptual evaluation constitutes a crucial aspect of various audio-processing tasks. Full reference (FR) or similarity-based metrics rely on high-quality reference recordings, to which lower-quality or corrupted versions of the recording may be compared for evaluation. In contrast, no-reference (NR) metrics evaluate a recording without relying on a reference. Both the FR and NR approaches exhibit… ▽ More

    Submitted 8 January, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

  2. arXiv:2206.13411  [pdf, other

    eess.AS cs.SD

    Audio Similarity is Unreliable as a Proxy for Audio Quality

    Authors: Pranay Manocha, Zeyu Jin, Adam Finkelstein

    Abstract: Many audio processing tasks require perceptual assessment. However, the time and expense of obtaining ``gold standard'' human judgments limit the availability of such data. Most applications incorporate full reference or other similarity-based metrics (e.g. PESQ) that depend on a clean reference. Researchers have relied on such metrics to evaluate and compare various proposed methods, often conclu… ▽ More

    Submitted 27 June, 2022; originally announced June 2022.

    Comments: To Appear, Interspeech 2022

  3. arXiv:2102.05109  [pdf, other

    eess.AS cs.LG cs.SD

    CDPAM: Contrastive learning for perceptual audio similarity

    Authors: Pranay Manocha, Zeyu Jin, Richard Zhang, Adam Finkelstein

    Abstract: Many speech processing methods based on deep learning require an automatic and differentiable audio metric for the loss function. The DPAM approach of Manocha et al. learns a full-reference metric trained directly on human judgments, and thus correlates well with human perception. However, it requires a large number of human annotations and does not generalize well outside the range of perturbatio… ▽ More

    Submitted 9 February, 2021; originally announced February 2021.

    Comments: Dataset, code and sound examples can be found at https://github.com/pranaymanocha/PerceptualAudio/tree/master/cdpam

  4. arXiv:2006.05694  [pdf, other

    eess.AS cs.LG cs.SD

    HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

    Authors: Jiaqi Su, Zeyu Jin, Adam Finkelstein

    Abstract: Real-world audio recordings are often degraded by factors such as noise, reverberation, and equalization distortion. This paper introduces HiFi-GAN, a deep learning method to transform recorded speech to sound as though it had been recorded in a studio. We use an end-to-end feed-forward WaveNet architecture, trained with multi-scale adversarial discriminators in both the time domain and the time-f… ▽ More

    Submitted 21 September, 2020; v1 submitted 10 June, 2020; originally announced June 2020.

    Comments: Accepted by INTERSPEECH 2020

  5. arXiv:2001.04460  [pdf, other

    eess.AS cs.SD

    A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences

    Authors: Pranay Manocha, Adam Finkelstein, Richard Zhang, Nicholas J. Bryan, Gautham J. Mysore, Zeyu Jin

    Abstract: Many audio processing tasks require perceptual assessment. The ``gold standard`` of obtaining human judgments is time-consuming, expensive, and cannot be used as an optimization criterion. On the other hand, automated metrics are efficient to compute but often correlate poorly with human judgment, particularly for audio differences at the threshold of human detection. In this work, we construct a… ▽ More

    Submitted 18 May, 2020; v1 submitted 13 January, 2020; originally announced January 2020.

    Comments: Dataset, code and sound examples can be found at https://pixl.cs.princeton.edu/pubs/Manocha_2020_ADP/