Skip to main content

Showing 1–22 of 22 results for author: Villatoro-Tello, E

.
  1. arXiv:2506.10622  [pdf, ps, other

    cs.CL cs.AI cs.LG

    SDialog: A Python Toolkit for Synthetic Dialogue Generation and Analysis

    Authors: Sergio Burdisso, Esaú Villatoro-Tello, Petr Motlicek

    Abstract: The advancement of conversational AI systems relies on the availability of high-quality, flexible, and reproducible synthetic dialogues for training, evaluation, and benchmarking. SDialog is a modular, extensible Python toolkit designed to address the challenges of synthetic dialogue generation and analysis. By leveraging instruction-tuned Large Language Models (LLMs), SDialog provides abstraction… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: https://github.com/idiap/sdialog

  2. arXiv:2506.04981  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering

    Authors: Andres Carofilis, Pradeep Rangappa, Srikanth Madikeri, Shashi Kumar, Sergio Burdisso, Jeena Prakash, Esau Villatoro-Tello, Petr Motlicek, Bidisha Sharma, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke

    Abstract: Fine-tuning pretrained ASR models for specific domains is challenging when labeled data is scarce. But unlabeled audio and labeled data from related domains are often available. We propose an incremental semi-supervised learning pipeline that first integrates a small in-domain labeled set and an auxiliary dataset from a closely related domain, achieving a relative improvement of 4% over no auxilia… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Accepted at Interspeech 2025, Netherlands

  3. arXiv:2506.03681  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering

    Authors: Pradeep Rangappa, Andres Carofilis, Jeena Prakash, Shashi Kumar, Sergio Burdisso, Srikanth Madikeri, Esau Villatoro-Tello, Bidisha Sharma, Petr Motlicek, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke

    Abstract: Fine-tuning pretrained ASR models for specific domains is challenging for small organizations with limited labeled data and computational resources. Here, we explore different data selection pipelines and propose a robust approach that improves ASR adaptation by filtering pseudo-labels generated using Whisper (encoder-decoder) and Zipformer (transducer) models. Our approach integrates multiple sel… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted at Interspeech 2025, Netherlands

  4. arXiv:2505.11280  [pdf, ps, other

    cs.CL

    Temporal fine-tuning for early risk detection

    Authors: Horacio Thompson, Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Marcelo Errecalde

    Abstract: Early Risk Detection (ERD) on the Web aims to identify promptly users facing social and health issues. Users are analyzed post-by-post, and it is necessary to guarantee correct and quick answers, which is particularly challenging in critical scenarios. ERD involves optimizing classification precision and minimizing detection delay. Standard classification metrics may not suffice, resorting to spec… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: In: Proceedings of the 53rd JAIIO / 50th CLEI - ASAID, 2024, p. 137. ISSN: 2451-7496

  5. arXiv:2411.03866  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward

    Authors: Shashi Kumar, Iuliia Thorbecke, Sergio Burdisso, Esaú Villatoro-Tello, Manjunath K E, Kadri Hacioğlu, Pradeep Rangappa, Petr Motlicek, Aravind Ganapathiraju, Andreas Stolcke

    Abstract: Recent research has demonstrated that training a linear connector between speech foundation encoders and large language models (LLMs) enables this architecture to achieve strong ASR capabilities. Despite the impressive results, it remains unclear whether these simple approaches are robust enough across different scenarios and speech conditions, such as domain shifts and speech perturbations. In th… ▽ More

    Submitted 22 January, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

    Comments: Accepted in ICASSP 2025 SALMA Workshop

    Journal ref: Proc. ICASSP Workshop on Speech and Audio Language Models (SALMA), 2025

  6. Mapping the Media Landscape: Predicting Factual Reporting and Political Bias Through Web Interactions

    Authors: Dairazalia Sánchez-Cortés, Sergio Burdisso, Esaú Villatoro-Tello, Petr Motlicek

    Abstract: Bias assessment of news sources is paramount for professionals, organizations, and researchers who rely on truthful evidence for information gathering and reporting. While certain bias indicators are discernible from content analysis, descriptors like political bias and fake news pose greater challenges. In this paper, we propose an extension to a recently presented news media reliability estimati… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

    Comments: Accepted to CLEF 2024

  7. arXiv:2409.13514  [pdf, other

    cs.CL cs.SD eess.AS

    LM-assisted keyword biasing with Aho-Corasick algorithm for Transducer-based ASR

    Authors: Iuliia Thorbecke, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Andres Carofilis, Shashi Kumar, Petr Motlicek, Karthik Pandia, Aravind Ganapathiraju

    Abstract: Despite the recent success of end-to-end models for automatic speech recognition, recognizing special rare and out-of-vocabulary words, as well as fast domain adaptation with text, are still challenging. It often happens that biasing to the special entities leads to a degradation in the overall performance. We propose a light on-the-fly method to improve automatic speech recognition performance by… ▽ More

    Submitted 20 September, 2024; originally announced September 2024.

    Comments: Submitted to ICASSP2025

  8. arXiv:2409.13499  [pdf, other

    cs.CL cs.SD eess.AS

    Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper

    Authors: Iuliia Thorbecke, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Shashi Kumar, Pradeep Rangappa, Sergio Burdisso, Petr Motlicek, Karthik Pandia, Aravind Ganapathiraju

    Abstract: The training of automatic speech recognition (ASR) with little to no supervised data remains an open question. In this work, we demonstrate that streaming Transformer-Transducer (TT) models can be trained from scratch in consumer and accessible GPUs in their entirety with pseudo-labeled (PL) speech from foundational speech models (FSM). This allows training a robust ASR model just in one stage and… ▽ More

    Submitted 7 October, 2024; v1 submitted 20 September, 2024; originally announced September 2024.

    Comments: Accepted to EMNLP Findings 2024

  9. arXiv:2407.04444  [pdf, other

    cs.CL cs.SD eess.AS

    TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR

    Authors: Shashi Kumar, Srikanth Madikeri, Juan Zuluaga-Gomez, Iuliia Thorbecke, Esaú Villatoro-Tello, Sergio Burdisso, Petr Motlicek, Karthik Pandia, Aravind Ganapathiraju

    Abstract: In traditional conversational intelligence from speech, a cascaded pipeline is used, involving tasks such as voice activity detection, diarization, transcription, and subsequent processing with different NLP models for tasks like semantic endpointing and named entity recognition (NER). Our paper introduces TokenVerse, a single Transducer-based model designed to handle multiple tasks. This is achie… ▽ More

    Submitted 8 October, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted at EMNLP 2024 (Main Conference)

  10. arXiv:2407.04439  [pdf, other

    eess.AS

    XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models

    Authors: Shashi Kumar, Srikanth Madikeri, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Iuliia Thorbecke, Petr Motlicek, Manjunath K E, Aravind Ganapathiraju

    Abstract: Self-supervised pretrained models exhibit competitive performance in automatic speech recognition on finetuning, even with limited in-domain supervised data. However, popular pretrained models are not suitable for streaming ASR because they are trained with full attention context. In this paper, we introduce XLSR-Transducer, where the XLSR-53 model is used as encoder in transducer setup. Our exper… ▽ More

    Submitted 8 October, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: 5 pages, double column

  11. arXiv:2404.14463  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    DAIC-WOZ: On the Validity of Using the Therapist's prompts in Automatic Depression Detection from Clinical Interviews

    Authors: Sergio Burdisso, Ernesto Reyes-Ramírez, Esaú Villatoro-Tello, Fernando Sánchez-Vega, Pastor López-Monroy, Petr Motlicek

    Abstract: Automatic depression detection from conversational data has gained significant interest in recent years. The DAIC-WOZ dataset, interviews conducted by a human-controlled virtual agent, has been widely used for this task. Recent studies have reported enhanced performance when incorporating interviewer's prompts into the model. In this work, we hypothesize that this improvement might be mainly due t… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to Clinical NLP workshop at NAACL 2024

  12. arXiv:2404.09565  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Reliability Estimation of News Media Sources: Birds of a Feather Flock Together

    Authors: Sergio Burdisso, Dairazalia Sánchez-Cortés, Esaú Villatoro-Tello, Petr Motlicek

    Abstract: Evaluating the reliability of news sources is a routine task for journalists and organizations committed to acquiring and disseminating accurate information. Recent research has shown that predicting sources' reliability represents an important first-prior step in addressing additional challenges such as fake news detection and fact-checking. In this paper, we introduce a novel approach for source… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024 Main Conference

  13. Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews

    Authors: Sergio Burdisso, Esaú Villatoro-Tello, Srikanth Madikeri, Petr Motlicek

    Abstract: We propose a simple approach for weighting self-connecting edges in a Graph Convolutional Network (GCN) and show its impact on depression detection from transcribed clinical interviews. To this end, we use a GCN for modeling non-consecutive and long-distance semantics to classify the transcriptions into depressed or control subjects. The proposed method aims to mitigate the limiting assumptions of… ▽ More

    Submitted 11 March, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: Paper Accepted to Interspeech 2023

    Journal ref: Interspeech 2023

  14. arXiv:2306.15685  [pdf, other

    eess.AS cs.CL

    Implementing contextual biasing in GPU decoder for online ASR

    Authors: Iuliia Nigmatulina, Srikanth Madikeri, Esaú Villatoro-Tello, Petr Motliček, Juan Zuluaga-Gomez, Karthik Pandia, Aravind Ganapathiraju

    Abstract: GPU decoding significantly accelerates the output of ASR predictions. While GPUs are already being used for online ASR decoding, post-processing and rescoring on GPUs have not been properly investigated yet. Rescoring with available contextual information can considerably improve ASR predictions. Previous studies have proven the viability of lattice rescoring in decoding and biasing language model… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

    Comments: Accepted to Interspeech 2023

  15. arXiv:2212.08489  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks

    Authors: Esaú Villatoro-Tello, Srikanth Madikeri, Juan Zuluaga-Gomez, Bidisha Sharma, Seyyed Saeed Sarfjoo, Iuliia Nigmatulina, Petr Motlicek, Alexei V. Ivanov, Aravind Ganapathiraju

    Abstract: In this paper, we perform an exhaustive evaluation of different representations to address the intent classification problem in a Spoken Language Understanding (SLU) setup. We benchmark three types of systems to perform the SLU intent detection task: 1) text-based, 2) lattice-based, and a novel 3) multimodal approach. Our work provides a comprehensive analysis of what could be the achievable perfo… ▽ More

    Submitted 17 March, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: Accepted in ICASSP 2023

    ACM Class: I.2.7

    Journal ref: ICASSP 2023

  16. IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach

    Authors: Sergio Burdisso, Juan Zuluaga-Gomez, Esau Villatoro-Tello, Martin Fajcik, Muskaan Singh, Pavel Smrz, Petr Motlicek

    Abstract: In this paper, we describe our participation in the subtask 1 of CASE-2022, Event Causality Identification with Casual News Corpus. We address the Causal Relation Identification (CRI) task by exploiting a set of simple yet complementary techniques for fine-tuning language models (LMs) on a small number of annotated examples (i.e., a few-shot configuration). We follow a prompt-based prediction appr… ▽ More

    Submitted 14 October, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: To be published in CASE@EMNLP 2022 (5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text)

    Journal ref: CASE @ EMNLP 2022

  17. arXiv:2209.03891  [pdf, other

    cs.CL cs.AI

    IDIAPers @ Causal News Corpus 2022: Extracting Cause-Effect-Signal Triplets via Pre-trained Autoregressive Language Model

    Authors: Martin Fajcik, Muskaan Singh, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Sergio Burdisso, Petr Motlicek, Pavel Smrz

    Abstract: In this paper, we describe our shared task submissions for Subtask 2 in CASE-2022, Event Causality Identification with Casual News Corpus. The challenge focused on the automatic detection of all cause-effect-signal spans present in the sentence from news-media. We detect cause-effect-signal spans in a sentence using T5 -- a pre-trained autoregressive language model. We iteratively identify all cau… ▽ More

    Submitted 20 October, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: Camera-ready for CASE@EMNLP

  18. arXiv:1909.09914  [pdf, other

    cs.SI

    Predicting consumers engagement on Facebook based on what and how companies write

    Authors: Érika S. Rosas-Quezada, Gabriela Ramírez-de-la-Rosa, Esaú Villatoro-Tello

    Abstract: Engaged costumers are a very import part of current social media marketing. Public figures and brands have to be very careful about what to post online. That is why the need for accurate strategies for anticipating the impact of a post written for an online audience is critical to any public brand. Therefore, in this paper, we propose a method to predict the impact of a given post by accounting fo… ▽ More

    Submitted 21 September, 2019; originally announced September 2019.

    Comments: Accepted at LKE 2019

  19. A Comparative Analysis of Distributional Term Representations for Author Profiling in Social Media

    Authors: Miguel Á. Álvarez-Carmona, Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Luis Villaseñor-Pienda

    Abstract: Author Profiling (AP) aims at predicting specific characteristics from a group of authors by analyzing their written documents. Many research has been focused on determining suitable features for modeling writing patterns from authors. Reported results indicate that content-based features continue to be the most relevant and discriminant features for solving this task. Thus, in this paper, we pres… ▽ More

    Submitted 21 May, 2019; originally announced May 2019.

    Journal ref: Journal of Intelligent & Fuzzy Systems, vol. 36, no. 5, pp. 4857-4868, 2019

  20. TxPI-u: A Resource for Personality Identification of Undergraduates

    Authors: Gabriela Ramírez-de-la-Rosa, Esaú Villatoro-Tello, Héctor Jiménez-Salazar

    Abstract: Resources such as labeled corpora are necessary to train automatic models within the natural language processing (NLP) field. Historically, a large number of resources regarding a broad number of problems are available mostly in English. One of such problems is known as Personality Identification where based on a psychological model (e.g. The Big Five Model), the goal is to find the traits of a su… ▽ More

    Submitted 20 June, 2018; originally announced June 2018.

    Journal ref: Journal of Intelligent & Fuzzy Systems, vol. 34, no. 5, pp. 2991-3001, 2018

  21. Semantically-informed distance and similarity measures for paraphrase plagiarism identification

    Authors: Miguel A. Álvarez-Carmona, Marc Franco-Salvador, Esaú Villatoro-Tello, Manuel Montes-y-Gómez, Paolo Rosso, Luis Villaseñor-Pineda

    Abstract: Paraphrase plagiarism identification represents a very complex task given that plagiarized texts are intentionally modified through several rewording techniques. Accordingly, this paper introduces two new measures for evaluating the relatedness of two given texts: a semantically-informed similarity measure and a semantically-informed edit distance. Both measures are able to extract semantic inform… ▽ More

    Submitted 29 May, 2018; originally announced May 2018.

    Journal ref: Journal of Intelligent & Fuzzy Systems, vol. 34, no. 5, pp. 2983-2990, 2018

  22. A visual approach for age and gender identification on Twitter

    Authors: Miguel A. Alvarez-Carmona, Luis Pellegrin, Manuel Montes-y-Gómez, Fernando Sánchez-Vega, Hugo Jair Escalante, A. Pastor López-Monroy, Luis Villaseñor-Pineda, Esaú Villatoro-Tello

    Abstract: The goal of Author Profiling (AP) is to identify demographic aspects (e.g., age, gender) from a given set of authors by analyzing their written texts. Recently, the AP task has gained interest in many problems related to computer forensics, psychology, marketing, but specially in those related with social media exploitation. As known, social media data is shared through a wide range of modalities… ▽ More

    Submitted 28 May, 2018; originally announced May 2018.

    Journal ref: Miguel A. Alvarez-Carmona, Luis Pellegrin et al. A visual approach for age and gender identification on Twitter. Journal of Intelligent and Fuzzy Systems, vol. 34, no. 5, pp. 3133-3145, 2018