Skip to main content

Showing 1–5 of 5 results for author: Azcarreta, J

.
  1. arXiv:2505.24498  [pdf, ps, other

    cs.LG

    Efficient Neural and Numerical Methods for High-Quality Online Speech Spectrogram Inversion via Gradient Theorem

    Authors: Andres Fernandez, Juan Azcarreta, Cagdas Bilen, Jesus Monge Alvarez

    Abstract: Recent work in online speech spectrogram inversion effectively combines Deep Learning with the Gradient Theorem to predict phase derivatives directly from magnitudes. Then, phases are estimated from their derivatives via least squares, resulting in a high quality reconstruction. In this work, we introduce three innovations that drastically reduce computational cost, while maintaining high quality:… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: Accepted at InterSpeech 2025

  2. arXiv:2505.15914  [pdf, ps, other

    cs.SD eess.AS

    A Novel Deep Learning Framework for Efficient Multichannel Acoustic Feedback Control

    Authors: Yuan-Kuei Wu, Juan Azcarreta, Kashyap Patel, Buye Xu, Jung-Suk Lee, Sanha Lee, Ashutosh Pandey

    Abstract: This study presents a deep-learning framework for controlling multichannel acoustic feedback in audio devices. Traditional digital signal processing methods struggle with convergence when dealing with highly correlated noise such as feedback. We introduce a Convolutional Recurrent Network that efficiently combines spatial and temporal processing, significantly enhancing speech enhancement capabili… ▽ More

    Submitted 29 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

    Comments: Accepted by Interspeech 2025

  3. arXiv:2407.04879  [pdf, other

    cs.SD eess.AS

    All Neural Low-latency Directional Speech Extraction

    Authors: Ashutosh Pandey, Sanha Lee, Juan Azcarreta, Daniel Wong, Buye Xu

    Abstract: We introduce a novel all neural model for low-latency directional speech extraction. The model uses direction of arrival (DOA) embeddings from a predefined spatial grid, which are transformed and fused into a recurrent neural network based speech extraction model. This process enables the model to effectively extract speech from a specified DOA. Unlike previous methods that relied on hand-crafted… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted for publication at INTERSPEECH 2024

  4. arXiv:2010.13648  [pdf, other

    eess.AS cs.SD

    Improving Sound Event Detection Metrics: Insights from DCASE 2020

    Authors: Giacomo Ferroni, Nicolas Turpault, Juan Azcarreta, Francesco Tuveri, Romain Serizel, Çagdaş Bilen, Sacha Krstulović

    Abstract: The ranking of sound event detection (SED) systems may be biased by assumptions inherent to evaluation criteria and to the choice of an operating point. This paper compares conventional event-based and segment-based criteria against the Polyphonic Sound Detection Score (PSDS)'s intersection-based criterion, over a selection of systems from DCASE 2020 Challenge Task 4. It shows that, by relying on… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

  5. arXiv:1910.08440  [pdf, other

    eess.AS cs.SD

    A Framework for the Robust Evaluation of Sound Event Detection

    Authors: Cagdas Bilen, Giacomo Ferroni, Francesco Tuveri, Juan Azcarreta, Sacha Krstulovic

    Abstract: This work defines a new framework for performance evaluation of polyphonic sound event detection (SED) systems, which overcomes the limitations of the conventional collar-based event decisions, event F-scores and event error rates. The proposed framework introduces a definition of event detection that is more robust against labelling subjectivity. It also resorts to polyphonic receiver operating c… ▽ More

    Submitted 14 February, 2020; v1 submitted 18 October, 2019; originally announced October 2019.

    Comments: Accepted to ICASSP 2020