Skip to main content

Showing 1–7 of 7 results for author: Badino, L

Searching in archive eess. Search in all archives.
.
  1. arXiv:2409.05750  [pdf, other

    eess.AS cs.MM

    A Toolkit for Joint Speaker Diarization and Identification with Application to Speaker-Attributed ASR

    Authors: Giovanni Morrone, Enrico Zovato, Fabio Brugnara, Enrico Sartori, Leonardo Badino

    Abstract: We present a modular toolkit to perform joint speaker diarization and speaker identification. The toolkit can leverage on multiple models and algorithms which are defined in a configuration file. Such flexibility allows our system to work properly in various conditions (e.g., multiple registered speakers' sets, acoustic conditions and languages) and across application domains (e.g. media monitorin… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: Show and Tell paper. Presented at Interspeech 2024

    Journal ref: Proceedings of Interspeech 2024, pp. 3652--3653

  2. arXiv:2406.09290  [pdf, ps, other

    eess.AS cs.CL cs.SD

    Exploring Spoken Language Identification Strategies for Automatic Transcription of Multilingual Broadcast and Institutional Speech

    Authors: Martina Valente, Fabio Brugnara, Giovanni Morrone, Enrico Zovato, Leonardo Badino

    Abstract: This paper addresses spoken language identification (SLI) and speech recognition of multilingual broadcast and institutional speech, real application scenarios that have been rarely addressed in the SLI literature. Observing that in these domains language changes are mostly associated with speaker changes, we propose a cascaded system consisting of speaker diarization and language identification a… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  3. Interpretable Dysarthric Speaker Adaptation based on Optimal-Transport

    Authors: Rosanna Turrisi, Leonardo Badino

    Abstract: This work addresses the mismatch problem between the distribution of training data (source) and testing data (target), in the challenging context of dysarthric speech recognition. We focus on Speaker Adaptation (SA) in command speech recognition, where data from multiple sources (i.e., multiple speakers) are available. Specifically, we propose an unsupervised Multi-Source Domain Adaptation (MSDA)… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: submitted to interspeech 2022

    Journal ref: Proc. Interspeech 2022

  4. arXiv:2104.02535  [pdf, other

    cs.SD cs.CL eess.AS

    Optimal Transport-based Adaptation in Dysarthric Speech Tasks

    Authors: Rosanna Turrisi, Leonardo Badino

    Abstract: In many real-world applications, the mismatch between distributions of training data (source) and test data (target) significantly degrades the performance of machine learning algorithms. In speech data, causes of this mismatch include different acoustic environments or speaker characteristics. In this paper, we address this issue in the challenging context of dysarthric speech, by multi-source do… ▽ More

    Submitted 14 March, 2022; v1 submitted 6 April, 2021; originally announced April 2021.

  5. arXiv:1912.02671  [pdf, other

    eess.AS cs.LG eess.IV

    Audio-Visual Target Speaker Enhancement on Multi-Talker Environment using Event-Driven Cameras

    Authors: Ander Arriandiaga, Giovanni Morrone, Luca Pasa, Leonardo Badino, Chiara Bartolozzi

    Abstract: We propose a method to address audio-visual target speaker enhancement in multi-talker environments using event-driven cameras. State of the art audio-visual speech separation methods shows that crucial information is the movement of the facial landmarks related to speech production. However, all approaches proposed so far work offline, using frame-based video input, making it difficult to process… ▽ More

    Submitted 22 February, 2021; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: Accepted at ISCAS 2021

  6. arXiv:1904.08248  [pdf, ps, other

    eess.AS cs.CL cs.SD stat.ML

    An Analysis of Speech Enhancement and Recognition Losses in Limited Resources Multi-talker Single Channel Audio-Visual ASR

    Authors: Luca Pasa, Giovanni Morrone, Leonardo Badino

    Abstract: In this paper, we analyzed how audio-visual speech enhancement can help to perform the ASR task in a cocktail party scenario. Therefore we considered two simple end-to-end LSTM-based models that perform single-channel audio-visual speech enhancement and phone recognition respectively. Then, we studied how the two models interact, and how to train them jointly affects the final result. We analyzed… ▽ More

    Submitted 27 November, 2019; v1 submitted 16 April, 2019; originally announced April 2019.

  7. arXiv:1811.02480  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Face Landmark-based Speaker-Independent Audio-Visual Speech Enhancement in Multi-Talker Environments

    Authors: Giovanni Morrone, Luca Pasa, Vadim Tikhanoff, Sonia Bergamaschi, Luciano Fadiga, Leonardo Badino

    Abstract: In this paper, we address the problem of enhancing the speech of a speaker of interest in a cocktail party scenario when visual information of the speaker of interest is available. Contrary to most previous studies, we do not learn visual features on the typically small audio-visual datasets, but use an already available face landmark detector (trained on a separate image dataset). The landmarks a… ▽ More

    Submitted 2 May, 2019; v1 submitted 6 November, 2018; originally announced November 2018.

    Comments: Proceedings of 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)