Skip to main content

Showing 1–9 of 9 results for author: Kakouros, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.02283  [pdf, ps, other

    cs.CL eess.AS

    Sounding Like a Winner? Prosodic Differences in Post-Match Interviews

    Authors: Sofoklis Kakouros, Haoyu Chen

    Abstract: This study examines the prosodic characteristics associated with winning and losing in post-match tennis interviews. Additionally, this research explores the potential to classify match outcomes solely based on post-match interview recordings using prosodic features and self-supervised learning (SSL) representations. By analyzing prosodic elements such as pitch and intensity, alongside SSL models… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025

  2. arXiv:2506.02239  [pdf, ps, other

    cs.CL eess.AS

    Investigating the Impact of Word Informativeness on Speech Emotion Recognition

    Authors: Sofoklis Kakouros

    Abstract: In emotion recognition from speech, a key challenge lies in identifying speech signal segments that carry the most relevant acoustic variations for discerning specific emotions. Traditional approaches compute functionals for features such as energy and F0 over entire sentences or longer speech portions, potentially missing essential fine-grained variation in the long-form statistics. This research… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025

  3. arXiv:2306.09814  [pdf, other

    eess.AS cs.CL

    Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody

    Authors: Sofoklis Kakouros, Juraj Šimko, Martti Vainio, Antti Suni

    Abstract: This paper investigates the use of word surprisal, a measure of the predictability of a word in a given context, as a feature to aid speech synthesis prosody. We explore how word surprisal extracted from large language models (LLMs) correlates with word prominence, a signal-based measure of the salience of a word in a given discourse. We also examine how context length and LLM size affect the resu… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted at SSW 2023

  4. arXiv:2305.16040  [pdf, other

    eess.AS

    The Power of Prosody and Prosody of Power: An Acoustic Analysis of Finnish Parliamentary Speech

    Authors: Martti Vainio, Antti Suni, Juraj Šimko, Sofoklis Kakouros

    Abstract: Parliamentary recordings provide a rich source of data for studying how politicians use speech to convey their messages and influence their audience. This provides a unique context for studying how politicians use speech, especially prosody, to achieve their goals. Here we analyzed a corpus of parliamentary speeches in the Finnish parliament between the years 2008-2020 and highlight methodological… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  5. arXiv:2305.11864  [pdf, other

    eess.AS cs.CL

    North Sámi Dialect Identification with Self-supervised Speech Models

    Authors: Sofoklis Kakouros, Katri Hiovain-Asikainen

    Abstract: The North Sámi (NS) language encapsulates four primary dialectal variants that are related but that also have differences in their phonology, morphology, and vocabulary. The unique geopolitical location of NS speakers means that in many cases they are bilingual in Sámi as well as in the dominant state language: Norwegian, Swedish, or Finnish. This enables us to study the NS variants both with resp… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted at Interspeech 2023

  6. arXiv:2211.01756  [pdf, other

    eess.AS cs.SD

    Speech-based emotion recognition with self-supervised models using attentive channel-wise correlations and label smoothing

    Authors: Sofoklis Kakouros, Themos Stafylakis, Ladislav Mosner, Lukas Burget

    Abstract: When recognizing emotions from speech, we encounter two common problems: how to optimally capture emotion-relevant information from the speech signal and how to best quantify or categorize the noisy subjective emotion labels. Self-supervised pre-trained representations can robustly capture information from speech enabling state-of-the-art results in many downstream tasks including emotion recognit… ▽ More

    Submitted 3 November, 2022; originally announced November 2022.

    Comments: Submitted to IEEE-ICASSP 2023

  7. arXiv:2210.09513  [pdf, other

    eess.AS cs.SD

    Extracting speaker and emotion information from self-supervised speech models via channel-wise correlations

    Authors: Themos Stafylakis, Ladislav Mosner, Sofoklis Kakouros, Oldrich Plchot, Lukas Burget, Jan Cernocky

    Abstract: Self-supervised learning of speech representations from large amounts of unlabeled data has enabled state-of-the-art results in several speech processing tasks. Aggregating these speech representations across time is typically approached by using descriptive statistics, and in particular, using the first- and second-order statistics of representation coefficients. In this paper, we examine an alte… ▽ More

    Submitted 15 October, 2022; originally announced October 2022.

    Comments: Accepted at IEEE-SLT 2022

  8. Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech Synthesis

    Authors: Antti Suni, Sofoklis Kakouros, Martti Vainio, Juraj Šimko

    Abstract: Recent advances in deep learning methods have elevated synthetic speech quality to human level, and the field is now moving towards addressing prosodic variation in synthetic speech.Despite successes in this effort, the state-of-the-art systems fall short of faithfully reproducing local prosodic events that give rise to, e.g., word-level emphasis and phrasal structure. This type of prosodic variat… ▽ More

    Submitted 29 June, 2020; originally announced June 2020.

  9. Dialect Identification of Spoken North Sámi Language Varieties Using Prosodic Features

    Authors: Sofoklis Kakouros, Katri Hiovain, Martti Vainio, Juraj Šimko

    Abstract: This work explores the application of various supervised classification approaches using prosodic information for the identification of spoken North Sámi language varieties. Dialects are language varieties that enclose characteristics specific for a given region or community. These characteristics reflect segmental and suprasegmental (prosodic) differences but also high-level properties such as le… ▽ More

    Submitted 23 March, 2020; originally announced March 2020.