Skip to main content

Showing 1–16 of 16 results for author: Burdisso, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.10622  [pdf, ps, other

    cs.CL cs.AI cs.LG

    SDialog: A Python Toolkit for Synthetic Dialogue Generation and Analysis

    Authors: Sergio Burdisso, Esaú Villatoro-Tello, Petr Motlicek

    Abstract: The advancement of conversational AI systems relies on the availability of high-quality, flexible, and reproducible synthetic dialogues for training, evaluation, and benchmarking. SDialog is a modular, extensible Python toolkit designed to address the challenges of synthetic dialogue generation and analysis. By leveraging instruction-tuned Large Language Models (LLMs), SDialog provides abstraction… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: https://github.com/idiap/sdialog

  2. arXiv:2506.04981  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering

    Authors: Andres Carofilis, Pradeep Rangappa, Srikanth Madikeri, Shashi Kumar, Sergio Burdisso, Jeena Prakash, Esau Villatoro-Tello, Petr Motlicek, Bidisha Sharma, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke

    Abstract: Fine-tuning pretrained ASR models for specific domains is challenging when labeled data is scarce. But unlabeled audio and labeled data from related domains are often available. We propose an incremental semi-supervised learning pipeline that first integrates a small in-domain labeled set and an auxiliary dataset from a closely related domain, achieving a relative improvement of 4% over no auxilia… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: Accepted at Interspeech 2025, Netherlands

  3. arXiv:2506.03681  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering

    Authors: Pradeep Rangappa, Andres Carofilis, Jeena Prakash, Shashi Kumar, Sergio Burdisso, Srikanth Madikeri, Esau Villatoro-Tello, Bidisha Sharma, Petr Motlicek, Kadri Hacioglu, Shankar Venkatesan, Saurabh Vyas, Andreas Stolcke

    Abstract: Fine-tuning pretrained ASR models for specific domains is challenging for small organizations with limited labeled data and computational resources. Here, we explore different data selection pipelines and propose a robust approach that improves ASR adaptation by filtering pseudo-labels generated using Whisper (encoder-decoder) and Zipformer (transducer) models. Our approach integrates multiple sel… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted at Interspeech 2025, Netherlands

  4. arXiv:2411.03866  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward

    Authors: Shashi Kumar, Iuliia Thorbecke, Sergio Burdisso, Esaú Villatoro-Tello, Manjunath K E, Kadri Hacioğlu, Pradeep Rangappa, Petr Motlicek, Aravind Ganapathiraju, Andreas Stolcke

    Abstract: Recent research has demonstrated that training a linear connector between speech foundation encoders and large language models (LLMs) enables this architecture to achieve strong ASR capabilities. Despite the impressive results, it remains unclear whether these simple approaches are robust enough across different scenarios and speech conditions, such as domain shifts and speech perturbations. In th… ▽ More

    Submitted 22 January, 2025; v1 submitted 6 November, 2024; originally announced November 2024.

    Comments: Accepted in ICASSP 2025 SALMA Workshop

    Journal ref: Proc. ICASSP Workshop on Speech and Audio Language Models (SALMA), 2025

  5. arXiv:2410.18481  [pdf, other

    cs.CL cs.AI cs.LG

    Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction

    Authors: Sergio Burdisso, Srikanth Madikeri, Petr Motlicek

    Abstract: Efficiently deriving structured workflows from unannotated dialogs remains an underexplored and formidable challenge in computational linguistics. Automating this process could significantly accelerate the manual design of workflows in new domains and enable the grounding of large language models in domain-specific flowcharts, enhancing transparency and controllability. In this paper, we introduce… ▽ More

    Submitted 5 November, 2024; v1 submitted 24 October, 2024; originally announced October 2024.

    Comments: Accepted to EMNLP 2024 main conference

    Journal ref: https://aclanthology.org/2024.emnlp-main.310/

  6. Mapping the Media Landscape: Predicting Factual Reporting and Political Bias Through Web Interactions

    Authors: Dairazalia Sánchez-Cortés, Sergio Burdisso, Esaú Villatoro-Tello, Petr Motlicek

    Abstract: Bias assessment of news sources is paramount for professionals, organizations, and researchers who rely on truthful evidence for information gathering and reporting. While certain bias indicators are discernible from content analysis, descriptors like political bias and fake news pose greater challenges. In this paper, we propose an extension to a recently presented news media reliability estimati… ▽ More

    Submitted 23 October, 2024; originally announced October 2024.

    Comments: Accepted to CLEF 2024

  7. arXiv:2409.13499  [pdf, other

    cs.CL cs.SD eess.AS

    Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper

    Authors: Iuliia Thorbecke, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Shashi Kumar, Pradeep Rangappa, Sergio Burdisso, Petr Motlicek, Karthik Pandia, Aravind Ganapathiraju

    Abstract: The training of automatic speech recognition (ASR) with little to no supervised data remains an open question. In this work, we demonstrate that streaming Transformer-Transducer (TT) models can be trained from scratch in consumer and accessible GPUs in their entirety with pseudo-labeled (PL) speech from foundational speech models (FSM). This allows training a robust ASR model just in one stage and… ▽ More

    Submitted 7 October, 2024; v1 submitted 20 September, 2024; originally announced September 2024.

    Comments: Accepted to EMNLP Findings 2024

  8. arXiv:2407.04444  [pdf, other

    cs.CL cs.SD eess.AS

    TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR

    Authors: Shashi Kumar, Srikanth Madikeri, Juan Zuluaga-Gomez, Iuliia Thorbecke, Esaú Villatoro-Tello, Sergio Burdisso, Petr Motlicek, Karthik Pandia, Aravind Ganapathiraju

    Abstract: In traditional conversational intelligence from speech, a cascaded pipeline is used, involving tasks such as voice activity detection, diarization, transcription, and subsequent processing with different NLP models for tasks like semantic endpointing and named entity recognition (NER). Our paper introduces TokenVerse, a single Transducer-based model designed to handle multiple tasks. This is achie… ▽ More

    Submitted 8 October, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: Accepted at EMNLP 2024 (Main Conference)

  9. arXiv:2404.14463  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    DAIC-WOZ: On the Validity of Using the Therapist's prompts in Automatic Depression Detection from Clinical Interviews

    Authors: Sergio Burdisso, Ernesto Reyes-Ramírez, Esaú Villatoro-Tello, Fernando Sánchez-Vega, Pastor López-Monroy, Petr Motlicek

    Abstract: Automatic depression detection from conversational data has gained significant interest in recent years. The DAIC-WOZ dataset, interviews conducted by a human-controlled virtual agent, has been widely used for this task. Recent studies have reported enhanced performance when incorporating interviewer's prompts into the model. In this work, we hypothesize that this improvement might be mainly due t… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted to Clinical NLP workshop at NAACL 2024

  10. arXiv:2404.09565  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Reliability Estimation of News Media Sources: Birds of a Feather Flock Together

    Authors: Sergio Burdisso, Dairazalia Sánchez-Cortés, Esaú Villatoro-Tello, Petr Motlicek

    Abstract: Evaluating the reliability of news sources is a routine task for journalists and organizations committed to acquiring and disseminating accurate information. Recent research has shown that predicting sources' reliability represents an important first-prior step in addressing additional challenges such as fake news detection and fact-checking. In this paper, we introduce a novel approach for source… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted to NAACL 2024 Main Conference

  11. Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews

    Authors: Sergio Burdisso, Esaú Villatoro-Tello, Srikanth Madikeri, Petr Motlicek

    Abstract: We propose a simple approach for weighting self-connecting edges in a Graph Convolutional Network (GCN) and show its impact on depression detection from transcribed clinical interviews. To this end, we use a GCN for modeling non-consecutive and long-distance semantics to classify the transcriptions into depressed or control subjects. The proposed method aims to mitigate the limiting assumptions of… ▽ More

    Submitted 11 March, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: Paper Accepted to Interspeech 2023

    Journal ref: Interspeech 2023

  12. IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach

    Authors: Sergio Burdisso, Juan Zuluaga-Gomez, Esau Villatoro-Tello, Martin Fajcik, Muskaan Singh, Pavel Smrz, Petr Motlicek

    Abstract: In this paper, we describe our participation in the subtask 1 of CASE-2022, Event Causality Identification with Casual News Corpus. We address the Causal Relation Identification (CRI) task by exploiting a set of simple yet complementary techniques for fine-tuning language models (LMs) on a small number of annotated examples (i.e., a few-shot configuration). We follow a prompt-based prediction appr… ▽ More

    Submitted 14 October, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: To be published in CASE@EMNLP 2022 (5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text)

    Journal ref: CASE @ EMNLP 2022

  13. arXiv:2209.03891  [pdf, other

    cs.CL cs.AI

    IDIAPers @ Causal News Corpus 2022: Extracting Cause-Effect-Signal Triplets via Pre-trained Autoregressive Language Model

    Authors: Martin Fajcik, Muskaan Singh, Juan Zuluaga-Gomez, Esaú Villatoro-Tello, Sergio Burdisso, Petr Motlicek, Pavel Smrz

    Abstract: In this paper, we describe our shared task submissions for Subtask 2 in CASE-2022, Event Causality Identification with Casual News Corpus. The challenge focused on the automatic detection of all cause-effect-signal spans present in the sentence from news-media. We detect cause-effect-signal spans in a sentence using T5 -- a pre-trained autoregressive language model. We iteratively identify all cau… ▽ More

    Submitted 20 October, 2022; v1 submitted 8 September, 2022; originally announced September 2022.

    Comments: Camera-ready for CASE@EMNLP

  14. arXiv:1912.09322  [pdf, other

    cs.LG cs.AI cs.IR cs.SE stat.ML

    PySS3: A Python package implementing a novel text classifier with visualization tools for Explainable AI

    Authors: Sergio G. Burdisso, Marcelo Errecalde, Manuel Montes-y-Gómez

    Abstract: A recently introduced text classifier, called SS3, has obtained state-of-the-art performance on the CLEF's eRisk tasks. SS3 was created to deal with risk detection over text streams and, therefore, not only supports incremental training and classification but also can visually explain its rationale. However, little attention has been paid to the potential use of SS3 as a general classifier. We bel… ▽ More

    Submitted 17 July, 2020; v1 submitted 19 December, 2019; originally announced December 2019.

  15. t-SS3: a text classifier with dynamic n-grams for early risk detection over text streams

    Authors: Sergio G. Burdisso, Marcelo Errecalde, Manuel Montes-y-Gómez

    Abstract: A recently introduced classifier, called SS3, has shown to be well suited to deal with early risk detection (ERD) problems on text streams. It obtained state-of-the-art performance on early depression and anorexia detection on Reddit in the CLEF's eRisk open tasks. SS3 was created to deal with ERD problems naturally since: it supports incremental training and classification over text streams, and… ▽ More

    Submitted 6 May, 2020; v1 submitted 11 November, 2019; originally announced November 2019.

    Comments: Highlights: (*) A classifier that is able to dynamically learn and recognize important word n-grams. (*) A novel text classifier having the ability to visually explain its rationale. (*) Support for incremental learning and text classification over streams. (*) Efficient model for addressing early risk detection problems

    Journal ref: Pattern Recognition Letters, Elsevier, 2020

  16. arXiv:1905.08772  [pdf, other

    cs.CY cs.CL cs.IR cs.LG cs.SI

    A Text Classification Framework for Simple and Effective Early Depression Detection Over Social Media Streams

    Authors: Sergio G. Burdisso, Marcelo Errecalde, Manuel Montes-y-Gómez

    Abstract: With the rise of the Internet, there is a growing need to build intelligent systems that are capable of efficiently dealing with early risk detection (ERD) problems on social media, such as early depression detection, early rumor detection or identification of sexual predators. These systems, nowadays mostly based on machine learning techniques, must be able to deal with data streams since users p… ▽ More

    Submitted 17 April, 2024; v1 submitted 18 May, 2019; originally announced May 2019.

    Comments: Highlights: (*) A novel text classifier having the ability to visually explain its rationale; (*) Domain-independent classification that does not require feature engineering; (*) Support for incremental learning and text classification over streams; (*) Efficient framework for addressing early risk detection problems; (*) State-of-the-art performance on early depression detection task

    Journal ref: 18 May 2019, Volume 133, Expert Systems With Applications, Elsevier