Skip to main content

Showing 1–3 of 3 results for author: Pasa, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:1912.02671  [pdf, other

    eess.AS cs.LG eess.IV

    Audio-Visual Target Speaker Enhancement on Multi-Talker Environment using Event-Driven Cameras

    Authors: Ander Arriandiaga, Giovanni Morrone, Luca Pasa, Leonardo Badino, Chiara Bartolozzi

    Abstract: We propose a method to address audio-visual target speaker enhancement in multi-talker environments using event-driven cameras. State of the art audio-visual speech separation methods shows that crucial information is the movement of the facial landmarks related to speech production. However, all approaches proposed so far work offline, using frame-based video input, making it difficult to process… ▽ More

    Submitted 22 February, 2021; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: Accepted at ISCAS 2021

  2. arXiv:1904.08248  [pdf, ps, other

    eess.AS cs.CL cs.SD stat.ML

    An Analysis of Speech Enhancement and Recognition Losses in Limited Resources Multi-talker Single Channel Audio-Visual ASR

    Authors: Luca Pasa, Giovanni Morrone, Leonardo Badino

    Abstract: In this paper, we analyzed how audio-visual speech enhancement can help to perform the ASR task in a cocktail party scenario. Therefore we considered two simple end-to-end LSTM-based models that perform single-channel audio-visual speech enhancement and phone recognition respectively. Then, we studied how the two models interact, and how to train them jointly affects the final result. We analyzed… ▽ More

    Submitted 27 November, 2019; v1 submitted 16 April, 2019; originally announced April 2019.

  3. arXiv:1811.02480  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments

    Authors: Giovanni Morrone, Luca Pasa, Vadim Tikhanoff, Sonia Bergamaschi, Luciano Fadiga, Leonardo Badino

    Abstract: In this paper, we address the problem of enhancing the speech of a speaker of interest in a cocktail party scenario when visual information of the speaker of interest is available. Contrary to most previous studies, we do not learn visual features on the typically small audio-visual datasets, but use an already available face landmark detector (trained on a separate image dataset). The landmarks a… ▽ More

    Submitted 2 May, 2019; v1 submitted 6 November, 2018; originally announced November 2018.

    Comments: Proceedings of 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)