Skip to main content

Showing 1–10 of 10 results for author: Baralis, E

Searching in archive eess. Search in all archives.
.
  1. arXiv:2505.20176  [pdf, ps, other

    cs.CL cs.LG eess.AS

    "KAN you hear me?" Exploring Kolmogorov-Arnold Networks for Spoken Language Understanding

    Authors: Alkis Koudounas, Moreno La Quatra, Eliana Pastor, Sabato Marco Siniscalchi, Elena Baralis

    Abstract: Kolmogorov-Arnold Networks (KANs) have recently emerged as a promising alternative to traditional neural architectures, yet their application to speech processing remains under explored. This work presents the first investigation of KANs for Spoken Language Understanding (SLU) tasks. We experiment with 2D-CNN models on two datasets, integrating KAN layers in five different configurations within th… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Accepted at INTERSPEECH 2025

  2. arXiv:2505.20050  [pdf, ps, other

    eess.AS cs.CL

    MVP: Multi-source Voice Pathology detection

    Authors: Alkis Koudounas, Moreno La Quatra, Gabriele Ciravegna, Marco Fantini, Erika Crosetti, Giovanni Succo, Tania Cerquitelli, Sabato Marco Siniscalchi, Elena Baralis

    Abstract: Voice disorders significantly impact patient quality of life, yet non-invasive automated diagnosis remains under-explored due to both the scarcity of pathological voice data, and the variability in recording sources. This work introduces MVP (Multi-source Voice Pathology detection), a novel approach that leverages transformers operating directly on raw voice signals. We explore three fusion strate… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Accepted at Interspeech 2025

  3. arXiv:2505.19978  [pdf, ps, other

    cs.CL cs.SD eess.AS

    DeepDialogue: A Multi-Turn Emotionally-Rich Spoken Dialogue Dataset

    Authors: Alkis Koudounas, Moreno La Quatra, Elena Baralis

    Abstract: Recent advances in conversational AI have demonstrated impressive capabilities in single-turn responses, yet multi-turn dialogues remain challenging for even the most sophisticated language models. Current dialogue datasets are limited in their emotional range, domain diversity, turn depth, and are predominantly text-only, hindering progress in developing more human-like conversational systems acr… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Currently under review. See the official website: https://salt-research.github.io/DeepDialogue

  4. arXiv:2505.15700  [pdf, ps, other

    cs.CL cs.SD eess.AS

    "Alexa, can you forget me?" Machine Unlearning Benchmark in Spoken Language Understanding

    Authors: Alkis Koudounas, Claudio Savelli, Flavio Giobergia, Elena Baralis

    Abstract: Machine unlearning, the process of efficiently removing specific information from machine learning models, is a growing area of interest for responsible AI. However, few studies have explored the effectiveness of unlearning methods on complex tasks, particularly speech-related ones. This paper introduces UnSLU-BENCH, the first benchmark for machine unlearning in spoken language understanding (SLU)… ▽ More

    Submitted 26 May, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

    Comments: Accepted at Interspeech 2025

  5. arXiv:2502.16298  [pdf, other

    eess.AS cs.SD

    voc2vec: A Foundation Model for Non-Verbal Vocalization

    Authors: Alkis Koudounas, Moreno La Quatra, Marco Sabato Siniscalchi, Elena Baralis

    Abstract: Speech foundation models have demonstrated exceptional capabilities in speech-related tasks. Nevertheless, these models often struggle with non-verbal audio data, such as vocalizations, baby crying, etc., which are critical for various real-world applications. Audio foundation models well handle non-speech data but also fail to capture the nuanced features of non-verbal human sounds. In this work,… ▽ More

    Submitted 22 February, 2025; originally announced February 2025.

    Comments: Accepted at ICASSP 2025

  6. Voice Disorder Analysis: a Transformer-based Approach

    Authors: Alkis Koudounas, Gabriele Ciravegna, Marco Fantini, Giovanni Succo, Erika Crosetti, Tania Cerquitelli, Elena Baralis

    Abstract: Voice disorders are pathologies significantly affecting patient quality of life. However, non-invasive automated diagnosis of these pathologies is still under-explored, due to both a shortage of pathological voice data, and diversity of the recording types used for the diagnosis. This paper proposes a novel solution that adopts transformers directly working on raw voice signals and addresses data… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

    Journal ref: Proc. Interspeech 2024, 3040-3044

  7. A Contrastive Learning Approach to Mitigate Bias in Speech Models

    Authors: Alkis Koudounas, Flavio Giobergia, Eliana Pastor, Elena Baralis

    Abstract: Speech models may be affected by performance imbalance in different population subgroups, raising concerns about fair treatment across these groups. Prior attempts to mitigate unfairness either focus on user-defined subgroups, potentially overlooking other affected subgroups, or do not explicitly improve the internal representation at the subgroup level. This paper proposes the first adoption of c… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

    Journal ref: Proc. Interspeech 2024, 827-831

  8. Benchmarking Representations for Speech, Music, and Acoustic Events

    Authors: Moreno La Quatra, Alkis Koudounas, Lorenzo Vaiani, Elena Baralis, Luca Cagliero, Paolo Garza, Sabato Marco Siniscalchi

    Abstract: Limited diversity in standardized benchmarks for evaluating audio representation learning (ARL) methods may hinder systematic comparison of current methods' capabilities. We present ARCH, a comprehensive benchmark for evaluating ARL methods on diverse audio classification domains, covering acoustic events, music, and speech. ARCH comprises 12 datasets, that allow us to thoroughly assess pre-traine… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  9. arXiv:2309.07733  [pdf, other

    cs.CL cs.SD eess.AS

    Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features

    Authors: Eliana Pastor, Alkis Koudounas, Giuseppe Attanasio, Dirk Hovy, Elena Baralis

    Abstract: Recent advances in eXplainable AI (XAI) have provided new insights into how models for vision, language, and tabular data operate. However, few approaches exist for understanding speech models. Existing work focuses on a few spoken language understanding (SLU) tasks, and explanations are difficult to interpret for most users. We introduce a new approach to explain speech classification models. We… ▽ More

    Submitted 14 September, 2023; originally announced September 2023.

    Comments: 8 pages

  10. ITALIC: An Italian Intent Classification Dataset

    Authors: Alkis Koudounas, Moreno La Quatra, Lorenzo Vaiani, Luca Colomba, Giuseppe Attanasio, Eliana Pastor, Luca Cagliero, Elena Baralis

    Abstract: Recent large-scale Spoken Language Understanding datasets focus predominantly on English and do not account for language-specific phenomena such as particular phonemes or words in different lects. We introduce ITALIC, the first large-scale speech dataset designed for intent classification in Italian. The dataset comprises 16,521 crowdsourced audio samples recorded by 70 speakers from various Itali… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: Accepted at INTERSPEECH 2023. Data and code at https://github.com/RiTA-nlp/ITALIC