Skip to main content

Showing 1–8 of 8 results for author: Nugraha, A A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.22805  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising

    Authors: Yoto Fujita, Aditya Arie Nugraha, Diego Di Carlo, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii

    Abstract: This paper describes speech enhancement for realtime automatic speech recognition (ASR) in real environments. A standard approach to this task is to use neural beamforming that can work efficiently in an online manner. It estimates the masks of clean dry speech from a noisy echoic mixture spectrogram with a deep neural network (DNN) and then computes a enhancement filter used for beamforming. The… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

    Comments: Accepted to APSIPA2024

  2. arXiv:2306.10240  [pdf, other

    cs.SD cs.LG eess.AS

    Neural Fast Full-Rank Spatial Covariance Analysis for Blind Source Separation

    Authors: Yoshiaki Bando, Yoshiki Masuyama, Aditya Arie Nugraha, Kazuyoshi Yoshii

    Abstract: This paper describes an efficient unsupervised learning method for a neural source separation model that utilizes a probabilistic generative model of observed multichannel mixtures proposed for blind source separation (BSS). For this purpose, amortized variational inference (AVI) has been used for directly solving the inverse problem of BSS with full-rank spatial covariance analysis (FCA). Althoug… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: 5 pages, 2 figures, accepted to EUSIPCO 2023

  3. arXiv:2305.04447  [pdf, other

    eess.AS cs.SD

    Neural Steerer: Novel Steering Vector Synthesis with a Causal Neural Field over Frequency and Source Positions

    Authors: Diego Di Carlo, Aditya Arie Nugraha, Mathieu Fontaine, Mathieu Fontaine, Kazuyoshi Yoshii

    Abstract: We address the problem of accurately interpolating measured anechoic steering vectors with a deep learning framework called the neural field. This task plays a pivotal role in reducing the resource-intensive measurements required for precise sound source separation and localization, essential as the front-end of speech recognition. Classical approaches to interpolation rely on linear weighting of… ▽ More

    Submitted 1 March, 2024; v1 submitted 7 May, 2023; originally announced May 2023.

    Comments: Camera ready version for HSCMA 24 at ICASSP 24

  4. arXiv:2207.10934  [pdf, other

    eess.AS cs.SD

    DNN-Free Low-Latency Adaptive Speech Enhancement Based on Frame-Online Beamforming Powered by Block-Online FastMNMF

    Authors: Aditya Arie Nugraha, Kouhei Sekiguchi, Mathieu Fontaine, Yoshiaki Bando, Kazuyoshi Yoshii

    Abstract: This paper describes a practical dual-process speech enhancement system that adapts environment-sensitive frame-online beamforming (front-end) with help from environment-free block-online source separation (back-end). To use minimum variance distortionless response (MVDR) beamforming, one may train a deep neural network (DNN) that estimates time-frequency masks used for computing the covariance ma… ▽ More

    Submitted 22 July, 2022; originally announced July 2022.

    Comments: IWAENC 2022

  5. arXiv:2207.07296  [pdf, other

    eess.AS cs.LG cs.SD

    Direction-Aware Adaptive Online Neural Speech Enhancement with an Augmented Reality Headset in Real Noisy Conversational Environments

    Authors: Kouhei Sekiguchi, Aditya Arie Nugraha, Yicheng Du, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii

    Abstract: This paper describes the practical response- and performance-aware development of online speech enhancement for an augmented reality (AR) headset that helps a user understand conversations made in real noisy echoic environments (e.g., cocktail party). One may use a state-of-the-art blind source separation method called fast multichannel nonnegative matrix factorization (FastMNMF) that works well i… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: IEEE/RSJ IROS 2022

  6. arXiv:2207.07273  [pdf, other

    eess.AS cs.LG cs.SD

    Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments

    Authors: Yicheng Du, Aditya Arie Nugraha, Kouhei Sekiguchi, Yoshiaki Bando, Mathieu Fontaine, Kazuyoshi Yoshii

    Abstract: This paper describes noisy speech recognition for an augmented reality headset that helps verbal communication within real multiparty conversational environments. A major approach that has actively been studied in simulated environments is to sequentially perform speech enhancement and automatic speech recognition (ASR) based on deep neural networks (DNNs) trained in a supervised manner. In our ta… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: INTERSPEECH 2022

  7. arXiv:1903.03269  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    A Deep Generative Model of Speech Complex Spectrograms

    Authors: Aditya Arie Nugraha, Kouhei Sekiguchi, Kazuyoshi Yoshii

    Abstract: This paper proposes an approach to the joint modeling of the short-time Fourier transform magnitude and phase spectrograms with a deep generative model. We assume that the magnitude follows a Gaussian distribution and the phase follows a von Mises distribution. To improve the consistency of the phase values in the time-frequency domain, we also apply the von Mises distribution to the phase derivat… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

  8. arXiv:1903.03237  [pdf, ps, other

    cs.SD cs.LG eess.AS stat.ML

    Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices

    Authors: Kouhei Sekiguchi, Aditya Arie Nugraha, Yoshiaki Bando, Kazuyoshi Yoshii

    Abstract: This paper describes a versatile method that accelerates multichannel source separation methods based on full-rank spatial modeling. A popular approach to multichannel source separation is to integrate a spatial model with a source model for estimating the spatial covariance matrices (SCMs) and power spectral densities (PSDs) of each sound source in the time-frequency domain. One of the most succe… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.