Skip to main content

Showing 1–18 of 18 results for author: Donley, J

Searching in archive eess. Search in all archives.
.
  1. arXiv:2501.18157  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Efficient Audiovisual Speech Processing via MUTUD: Multimodal Training and Unimodal Deployment

    Authors: Joanna Hong, Sanjeel Parekh, Honglie Chen, Jacob Donley, Ke Tan, Buye Xu, Anurag Kumar

    Abstract: Building reliable speech systems often requires combining multiple modalities, like audio and visual cues. While such multimodal solutions frequently lead to improvements in performance and may even be critical in certain cases, they come with several constraints such as increased sensory requirements, computational cost, and modality synchronization, to mention a few. These challenges constrain t… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

  2. arXiv:2409.11731  [pdf, other

    eess.AS cs.SD

    Performance and Robustness of Signal-Dependent vs. Signal-Independent Binaural Signal Matching with Wearable Microphone Arrays

    Authors: Ami Berger, Vladimir Tourbabin, Jacob Donley, Zamir Ben-Hur, Boaz Rafaely

    Abstract: The increasing popularity of spatial audio in applications such as teleconferencing, entertainment, and virtual reality has led to the recent developments of binaural reproduction methods. However, only a few of these methods are well-suited for wearable and mobile arrays, which typically consist of a small number of microphones. One such method is binaural signal matching (BSM), which has been sh… ▽ More

    Submitted 14 February, 2025; v1 submitted 18 September, 2024; originally announced September 2024.

  3. arXiv:2409.11494  [pdf, other

    eess.AS cs.SD

    M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses

    Authors: Yufeng Yang, Desh Raj, Ju Lin, Niko Moritz, Junteng Jia, Gil Keren, Egor Lakomkin, Yiteng Huang, Jacob Donley, Jay Mahadeokar, Ozlem Kalinli

    Abstract: The growing popularity of multi-channel wearable devices, such as smart glasses, has led to a surge of applications such as targeted speech recognition and enhanced hearing. However, current approaches to solve these tasks use independently trained models, which may not benefit from large amounts of unlabeled data. In this paper, we propose M-BEST-RQ, the first multi-channel speech foundation mode… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: In submission to IEEE ICASSP 2025

  4. Design and Analysis of Binaural Signal Matching with Arbitrary Microphone Arrays and Listener Head Rotations

    Authors: Lior Madmoni, Zamir Ben-Hur, Jacob Donley, Vladimir Tourbabin, Boaz Rafaely

    Abstract: Binaural reproduction is rapidly becoming a topic of great interest in the research community, especially with the surge of new and popular devices, such as virtual reality headsets, smart glasses, and head-tracked headphones. In order to immerse the listener in a virtual or remote environment with such devices, it is essential to generate realistic and accurate binaural signals. This is challengi… ▽ More

    Submitted 29 April, 2025; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: Published on EURASIP Journal on audio speech and music processing

  5. Ambisonics Encoding For Arbitrary Microphone Arrays Incorporating Residual Channels For Binaural Reproduction

    Authors: Yhonatan Gayer, Vladimir Tourbabin, Zamir Ben-Hur, Jacob Donley, Boaz Rafaely

    Abstract: In the rapidly evolving fields of virtual and augmented reality, accurate spatial audio capture and reproduction are essential. For these applications, Ambisonics has emerged as a standard format. However, existing methods for encoding Ambisonics signals from arbitrary microphone arrays face challenges, such as errors due to the irregular array configurations and limited spatial resolution resulti… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted for presentation at HSCMA 2024

  6. arXiv:2401.07882  [pdf, other

    cs.SD eess.AS

    On the Importance of Neural Wiener Filter for Resource Efficient Multichannel Speech Enhancement

    Authors: Tsun-An Hsieh, Jacob Donley, Daniel Wong, Buye Xu, Ashutosh Pandey

    Abstract: We introduce a time-domain framework for efficient multichannel speech enhancement, emphasizing low latency and computational efficiency. This framework incorporates two compact deep neural networks (DNNs) surrounding a multichannel neural Wiener filter (NWF). The first DNN enhances the speech signal to estimate NWF coefficients, while the second DNN refines the output from the NWF. The NWF, while… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted for publication at ICASSP

  7. arXiv:2311.18689  [pdf, other

    eess.AS cs.SD eess.SP

    Subspace Hybrid MVDR Beamforming for Augmented Hearing

    Authors: Sina Hafezi, Alastair H. Moore, Pierre H. Guiraud, Patrick A. Naylor, Jacob Donley, Vladimir Tourbabin, Thomas Lunner

    Abstract: Signal-dependent beamformers are advantageous over signal-independent beamformers when the acoustic scenario - be it real-world or simulated - is straightforward in terms of the number of sound sources, the ambient sound field and their dynamics. However, in the context of augmented reality audio using head-worn microphone arrays, the acoustic scenarios encountered are often far from straightforwa… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 14 pages, 10 figures, submitted for IEEE/ACM Transactions on Audio, Speech, and Language Processing on 23-Nov-2023

  8. arXiv:2311.13390  [pdf, other

    eess.AS

    Performance Analysis Of Binaural Signal Matching (BSM) in the Time-Frequency Domain

    Authors: Ami Berger, Vladimir Tourbabin, Jacob Donley, Zamir Ben-Hur, Boaz Rafaely

    Abstract: The capture and reproduction of spatial audio is becoming increasingly popular, with the mushrooming of applications in teleconferencing, entertainment and virtual reality. Many binaural reproduction methods have been developed and studied extensively for spherical and other specially designed arrays. However, the recent increased popularity of wearable and mobile arrays requires the development o… ▽ More

    Submitted 23 November, 2023; v1 submitted 22 November, 2023; originally announced November 2023.

    Journal ref: in Proceedings of the 24th International Congress on Acoustics (ICA 2022), ABS-0302, 2022

  9. arXiv:2303.08967  [pdf, other

    eess.AS eess.SP

    Subspace Hybrid Beamforming for Head-worn Microphone Arrays

    Authors: Sina Hafezi, Alastair H. Moore, Pierre Guiraud, Patrick A. Naylor, Jacob Donley, Vladimir Tourbabin, Thomas Lunner

    Abstract: A two-stage multi-channel speech enhancement method is proposed which consists of a novel adaptive beamformer, Hybrid Minimum Variance Distortionless Response (MVDR), Isotropic-MVDR (Iso), and a novel multi-channel spectral Principal Components Analysis (PCA) denoising. In the first stage, the Hybrid-MVDR performs multiple MVDRs using a dictionary of pre-defined noise field models and picks the mi… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: 5 pages, 4 figures, accepted for ICASSP 2023

  10. arXiv:2212.11377  [pdf, other

    eess.AS cs.CV cs.LG cs.SD

    ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement

    Authors: Wei-Ning Hsu, Tal Remez, Bowen Shi, Jacob Donley, Yossi Adi

    Abstract: Prior works on improving speech quality with visual input typically study each type of auditory distortion separately (e.g., separation, inpainting, video-to-speech) and present tailored algorithms. This paper proposes to unify these subjects and study Generalized Speech Enhancement, where the goal is not to reconstruct the exact reference clean signal, but to focus on improving certain aspects of… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

  11. arXiv:2211.10999  [pdf, other

    cs.SD cs.CV cs.LG eess.AS

    LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders

    Authors: Rodrigo Mira, Buye Xu, Jacob Donley, Anurag Kumar, Stavros Petridis, Vamsi Krishna Ithapu, Maja Pantic

    Abstract: Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging not only the audio itself but also the target speaker's lip movements. This approach has been shown to yield improvements over audio-only speech enhancement, particularly for the removal of interfering speech. Despite recent advances in speech synthesis, most audio-visual approaches continue to use… ▽ More

    Submitted 13 March, 2023; v1 submitted 20 November, 2022; originally announced November 2022.

    Comments: accepted to ICASSP 2023

  12. arXiv:2202.00538  [pdf, other

    cs.SD cs.CV eess.AS

    The impact of removing head movements on audio-visual speech enhancement

    Authors: Zhiqi Kang, Mostafa Sadeghi, Radu Horaud, Xavier Alameda-Pineda, Jacob Donley, Anurag Kumar

    Abstract: This paper investigates the impact of head movements on audio-visual speech enhancement (AVSE). Although being a common conversational feature, head movements have been ignored by past and recent studies: they challenge today's learning-based methods as they often degrade the performance of models that are trained on clean, frontal, and steady face images. To alleviate this problem, we propose to… ▽ More

    Submitted 2 February, 2022; v1 submitted 1 February, 2022; originally announced February 2022.

  13. arXiv:2112.04613  [pdf, other

    cs.SD eess.AS

    NICE-Beam: Neural Integrated Covariance Estimators for Time-Varying Beamformers

    Authors: Jonah Casebeer, Jacob Donley, Daniel Wong, Buye Xu, Anurag Kumar

    Abstract: Estimating a time-varying spatial covariance matrix for a beamforming algorithm is a challenging task, especially for wearable devices, as the algorithm must compensate for time-varying signal statistics due to rapid pose-changes. In this paper, we propose Neural Integrated Covariance Estimators for Beamformers, NICE-Beam. NICE-Beam is a general technique for learning how to estimate time-varying… ▽ More

    Submitted 8 December, 2021; originally announced December 2021.

  14. arXiv:2110.13130  [pdf, other

    cs.SD eess.AS

    Multichannel Speech Enhancement without Beamforming

    Authors: Asutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang

    Abstract: Deep neural networks are often coupled with traditional spatial filters, such as MVDR beamformers for effectively exploiting spatial information. Even though single-stage end-to-end supervised models can obtain impressive enhancement, combining them with a traditional beamformer and a DNN-based post-filter in a multistage processing provides additional improvements. In this work, we propose a two-… ▽ More

    Submitted 6 April, 2022; v1 submitted 25 October, 2021; originally announced October 2021.

    Comments: Accepted for publication in ICASSP 2022

  15. arXiv:2110.11844  [pdf, other

    cs.SD eess.AS

    Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network

    Authors: Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang

    Abstract: Deep neural networks (DNNs) are very effective for multichannel speech enhancement with fixed array geometries. However, it is not trivial to use DNNs for ad-hoc arrays with unknown order and placement of microphones. We propose a novel triple-path network for ad-hoc array processing in the time domain. The key idea in the network design is to divide the overall processing into spatial processing… ▽ More

    Submitted 4 July, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

    Comments: Accepted for publication in INTERSPEECH 2022

  16. arXiv:2110.10757  [pdf, other

    cs.SD eess.AS

    TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement

    Authors: Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang

    Abstract: In this work, we propose a new model called triple-path attentive recurrent network (TPARN) for multichannel speech enhancement in the time domain. TPARN extends a single-channel dual-path network to a multichannel network by adding a third path along the spatial dimension. First, TPARN processes speech signals from all channels independently using a dual-path attentive recurrent network (ARN), wh… ▽ More

    Submitted 6 April, 2022; v1 submitted 20 October, 2021; originally announced October 2021.

    Comments: Accepted for publication in ICASSP 2022

  17. arXiv:2107.04174  [pdf, other

    cs.SD cs.CV cs.LG eess.AS eess.SP

    EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments

    Authors: Jacob Donley, Vladimir Tourbabin, Jung-Suk Lee, Mark Broyles, Hao Jiang, Jie Shen, Maja Pantic, Vamsi Krishna Ithapu, Ravish Mehra

    Abstract: Augmented Reality (AR) as a platform has the potential to facilitate the reduction of the cocktail party effect. Future AR headsets could potentially leverage information from an array of sensors spanning many different modalities. Training and testing signal processing and machine learning algorithms on tasks such as beam-forming and speech enhancement require high quality representative data. To… ▽ More

    Submitted 18 October, 2021; v1 submitted 8 July, 2021; originally announced July 2021.

    Comments: Dataset is available at: https://github.com/facebookresearch/EasyComDataset

  18. arXiv:2102.06934  [pdf, other

    cs.SD eess.AS

    Multi-Channel Speech Enhancement using Graph Neural Networks

    Authors: Panagiotis Tzirakis, Anurag Kumar, Jacob Donley

    Abstract: Multi-channel speech enhancement aims to extract clean speech from a noisy mixture using signals captured from multiple microphones. Recently proposed methods tackle this problem by incorporating deep neural network models with spatial filtering techniques such as the minimum variance distortionless response (MVDR) beamformer. In this paper, we introduce a different research direction by viewing e… ▽ More

    Submitted 13 February, 2021; originally announced February 2021.

    Journal ref: Proc. ICASSP 2021