Skip to main content

Showing 1–9 of 9 results for author: Williamson, D S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.18395  [pdf, other

    cs.LG cs.SD eess.AS

    A contrastive-learning approach for auditory attention detection

    Authors: Seyed Ali Alavi Bajestan, Mark Pitt, Donald S. Williamson

    Abstract: Carrying conversations in multi-sound environments is one of the more challenging tasks, since the sounds overlap across time and frequency making it difficult to understand a single sound source. One proposed approach to help isolate an attended speech source is through decoding the electroencephalogram (EEG) and identifying the attended audio source using statistical or machine learning techniqu… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

  2. arXiv:2410.13182  [pdf, other

    eess.AS cs.SD

    Using RLHF to align speech enhancement approaches to mean-opinion quality scores

    Authors: Anurag Kumar, Andrew Perrault, Donald S. Williamson

    Abstract: Objective speech quality measures are typically used to assess speech enhancement algorithms, but it has been shown that they are sub-optimal as learning objectives because they do not always align well with human subjective ratings. This misalignment often results in noticeable distortions and artifacts that cause speech enhancement to be ineffective. To address these issues, we propose a reinfor… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: Submitted to ICASSP 2025

  3. arXiv:2305.00104  [pdf, other

    cs.CV eess.AS eess.IV

    MMViT: Multiscale Multiview Vision Transformers

    Authors: Yuchen Liu, Natasha Ong, Kaiyan Peng, Bo Xiong, Qifan Wang, Rui Hou, Madian Khabsa, Kaiyue Yang, David Liu, Donald S. Williamson, Hanchao Yu

    Abstract: We present Multiscale Multiview Vision Transformers (MMViT), which introduces multiscale feature maps and multiview encodings to transformer models. Our model encodes different views of the input signal and builds several channel-resolution feature stages to process the multiple views of the input at different resolutions in parallel. At each scale stage, we use a cross-attention block to fuse inf… ▽ More

    Submitted 28 April, 2023; originally announced May 2023.

  4. arXiv:2302.04932  [pdf, other

    eess.AS cs.SD

    A Composite T60 Regression and Classification Approach for Speech Dereverberation

    Authors: Yuying Li, Yuchen Liu, Donald S. Williamson

    Abstract: Dereverberation is often performed directly on the reverberant audio signal, without knowledge of the acoustic environment. Reverberation time, T60, however, is an essential acoustic factor that reflects how reverberation may impact a signal. In this work, we propose to perform dereverberation while leveraging key acoustic information from the environment. More specifically, we develop a joint lea… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

  5. arXiv:2203.16032  [pdf, other

    cs.SD eess.AS

    ConferencingSpeech 2022 Challenge: Non-intrusive Objective Speech Quality Assessment (NISQA) Challenge for Online Conferencing Applications

    Authors: Gaoxiong Yi, Wei Xiao, Yiming Xiao, Babak Naderi, Sebastian Möller, Wafaa Wardah, Gabriel Mittag, Ross Cutler, Zhuohuang Zhang, Donald S. Williamson, Fei Chen, Fuzheng Yang, Shidong Shang

    Abstract: With the advances in speech communication systems such as online conferencing applications, we can seamlessly work with people regardless of where they are. However, during online meetings, speech quality can be significantly affected by background noise, reverberation, packet loss, network jitter, etc. Because of its nature, speech quality is traditionally assessed in subjective tests in laborato… ▽ More

    Submitted 31 March, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

  6. arXiv:2012.13442  [pdf, other

    eess.AS cs.SD

    Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

    Authors: Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Donald S. Williamson, Dong Yu

    Abstract: Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems. Minimum variance distortionless response (MVDR) filters are often adopted to remove nonlinear distortions, however, conventional neural mask-based MVDR systems s… ▽ More

    Submitted 15 November, 2021; v1 submitted 24 December, 2020; originally announced December 2020.

    Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP); Demos available at https://zzhang68.github.io/mcmf-adl-mvdr/

  7. arXiv:2007.15797  [pdf, other

    eess.AS cs.CL cs.LG cs.MM cs.SD

    A Pyramid Recurrent Network for Predicting Crowdsourced Speech-Quality Ratings of Real-World Signals

    Authors: Xuan Dong, Donald S. Williamson

    Abstract: The real-world capabilities of objective speech quality measures are limited since current measures (1) are developed from simulated data that does not adequately model real environments; or they (2) predict objective scores that are not always strongly correlated with subjective ratings. Additionally, a large dataset of real-world signals with listener quality ratings does not currently exist, wh… ▽ More

    Submitted 30 July, 2020; originally announced July 2020.

    Comments: Proceeding of INTERSPEECH

  8. arXiv:2007.14986  [pdf, other

    eess.AS cs.SD

    Investigation of Phase Distortion on Perceived Speech Quality for Hearing-impaired Listeners

    Authors: Zhuohuang Zhang, Donald S. Williamson, Yi Shen

    Abstract: Phase serves as a critical component of speech that influences the quality and intelligibility. Current speech enhancement algorithms are beginning to address phase distortions, but the algorithms focus on normal-hearing (NH) listeners. It is not clear whether phase enhancement is beneficial for hearing-impaired (HI) listeners. We investigated the influence of phase distortion on speech quality th… ▽ More

    Submitted 29 July, 2020; originally announced July 2020.

    Comments: accepted by Interspeech2020, 5 pages, 4 figures

  9. arXiv:2007.14974  [pdf, other

    eess.AS cs.SD

    On Loss Functions and Recurrency Training for GAN-based Speech Enhancement Systems

    Authors: Zhuohuang Zhang, Chengyun Deng, Yi Shen, Donald S. Williamson, Yongtao Sha, Yi Zhang, Hui Song, Xiangang Li

    Abstract: Recent work has shown that it is feasible to use generative adversarial networks (GANs) for speech enhancement, however, these approaches have not been compared to state-of-the-art (SOTA) non GAN-based approaches. Additionally, many loss functions have been proposed for GAN-based approaches, but they have not been adequately compared. In this study, we propose novel convolutional recurrent GAN (CR… ▽ More

    Submitted 26 December, 2020; v1 submitted 29 July, 2020; originally announced July 2020.

    Comments: accepted by Interspeech2020, 5 pages, 2 figures