Skip to main content

Showing 1–6 of 6 results for author: Guðnason, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2306.05709  [pdf, other

    eess.AS cs.CL cs.SD

    Learning Emotional Representations from Imbalanced Speech Data for Speech Emotion Recognition and Emotional Text-to-Speech

    Authors: Shijun Wang, Jón Guðnason, Damian Borth

    Abstract: Effective speech emotional representations play a key role in Speech Emotion Recognition (SER) and Emotional Text-To-Speech (TTS) tasks. However, emotional speech samples are more difficult and expensive to acquire compared with Neutral style speech, which causes one issue that most related works unfortunately neglect: imbalanced datasets. Models might overfit to the majority Neutral class and fai… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: Accepted by INTERSPEECH2023

  2. arXiv:2303.01508  [pdf, other

    cs.SD cs.AI eess.AS

    Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities

    Authors: Shijun Wang, Jón Guðnason, Damian Borth

    Abstract: State-of-the-art Text-To-Speech (TTS) models are capable of producing high-quality speech. The generated speech, however, is usually neutral in emotional expression, whereas very often one would want fine-grained emotional control of words or phonemes. Although still challenging, the first TTS models have been recently proposed that are able to control voice by manually assigning emotion intensity… ▽ More

    Submitted 11 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP2023

  3. arXiv:2208.04994  [pdf, other

    cs.SD eess.AS

    Generative Data Augmentation Guided by Triplet Loss for Speech Emotion Recognition

    Authors: Shijun Wang, Hamed Hemati, Jón Guðnason, Damian Borth

    Abstract: Speech Emotion Recognition (SER) is crucial for human-computer interaction but still remains a challenging problem because of two major obstacles: data scarcity and imbalance. Many datasets for SER are substantially imbalanced, where data utterances of one class (most often Neutral) are much more frequent than those of other classes. Furthermore, only a few data resources are available for many ex… ▽ More

    Submitted 9 August, 2022; originally announced August 2022.

    Comments: Published in INTERSPEECH 2022

  4. arXiv:2003.09244  [pdf, ps, other

    cs.CL

    Language Technology Programme for Icelandic 2019-2023

    Authors: Anna Björk Nikulásdóttir, Jón Guðnason, Anton Karl Ingason, Hrafn Loftsson, Eiríkur Rögnvaldsson, Einar Freyr Sigurðsson, Steinþór Steingrímsson

    Abstract: In this paper, we describe a new national language technology programme for Icelandic. The programme, which spans a period of five years, aims at making Icelandic usable in communication and interactions in the digital world, by developing accessible, open-source language resources and software. The research and development work within the programme is carried out by a consortium of universities,… ▽ More

    Submitted 20 March, 2020; originally announced March 2020.

    Comments: Accepted at LREC 2020

  5. arXiv:2001.00473  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Detection of Glottal Closure Instants from Speech Signals: a Quantitative Review

    Authors: Thomas Drugman, Mark Thomas, Jon Gudnason, Patrick Naylor, Thierry Dutoit

    Abstract: The pseudo-periodicity of voiced speech can be exploited in several speech processing applications. This requires however that the precise locations of the Glottal Closure Instants (GCIs) are available. The focus of this paper is the evaluation of automatic methods for the detection of GCIs directly from the speech waveform. Five state-of-the-art GCI detection algorithms are compared using six dif… ▽ More

    Submitted 28 December, 2019; originally announced January 2020.

  6. arXiv:1807.11893  [pdf, other

    eess.AS cs.SD

    Manual Post-editing of Automatically Transcribed Speeches from the Icelandic Parliament - Althingi

    Authors: Judy Y. Fong, Michal Borsky, Inga R. Helgadóttir, Jon Gudnason

    Abstract: The design objectives for an automatic transcription system are to produce text readable by humans and to minimize the impact on manual post-editing. This study reports on a recognition system used for transcribing speeches in the Icelandic parliament - Althingi. It evaluates the system performance and its effect on manual post-editing. The results are compared against the original manual transcri… ▽ More

    Submitted 31 July, 2018; originally announced July 2018.

    Comments: submitted to IEEE SLT 2018, Athens