Skip to main content

Showing 1–15 of 15 results for author: Kurimo, M

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.08717  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Multi-Teacher Language-Aware Knowledge Distillation for Multilingual Speech Emotion Recognition

    Authors: Mehedi Hasan Bijoy, Dejan Porjazovski, Tamás Grósz, Mikko Kurimo

    Abstract: Speech Emotion Recognition (SER) is crucial for improving human-computer interaction. Despite strides in monolingual SER, extending them to build a multilingual system remains challenging. Our goal is to train a single model capable of multilingual SER by distilling knowledge from multiple teacher models. To address this, we introduce a novel language-aware multi-teacher knowledge distillation met… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: Accepted to INTERSPEECH 2025 conference

  2. arXiv:2506.01156  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Mispronunciation Detection Without L2 Pronunciation Dataset in Low-Resource Setting: A Case Study in Finland Swedish

    Authors: Nhan Phan, Mikko Kuronen, Maria Kautonen, Riikka Ullakonoja, Anna von Zansen, Yaroslav Getman, Ekaterina Voskoboinik, Tamás Grósz, Mikko Kurimo

    Abstract: Mispronunciation detection (MD) models are the cornerstones of many language learning applications. Unfortunately, most systems are built for English and other major languages, while low-resourced language varieties, such as Finland Swedish (FS), lack such tools. In this paper, we introduce our MD model for FS, trained on 89 hours of first language (L1) speakers' spontaneous speech and tested on 3… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Accepted to Interspeech 2025 conference

  3. arXiv:2504.20678  [pdf, other

    cs.CL eess.AS

    Non-native Children's Automatic Speech Assessment Challenge (NOCASA)

    Authors: Yaroslav Getman, Tamás Grósz, Mikko Kurimo, Giampiero Salvi

    Abstract: This paper presents the "Non-native Children's Automatic Speech Assessment" (NOCASA) - a data competition part of the IEEE MLSP 2025 conference. NOCASA challenges participants to develop new systems that can assess single-word pronunciations of young second language (L2) learners as part of a gamified pronunciation training app. To achieve this, several issues must be addressed, most notably the l… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: First draft of the baseline paper for the NOCASA competition (https://teflon.aalto.fi/nocasa-2025/), 5 pages

  4. Out-of-distribution generalisation in spoken language understanding

    Authors: Dejan Porjazovski, Anssi Moisio, Mikko Kurimo

    Abstract: Test data is said to be out-of-distribution (OOD) when it unexpectedly differs from the training data, a common challenge in real-world use cases of machine learning. Although OOD generalisation has gained interest in recent years, few works have focused on OOD generalisation in spoken language understanding (SLU) tasks. To facilitate research on this topic, we introduce a modified version of the… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted for INTERSPEECH 2024

  5. Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian Inference

    Authors: Dejan Porjazovski, Yaroslav Getman, Tamás Grósz, Mikko Kurimo

    Abstract: Large pre-trained models are essential in paralinguistic systems, demonstrating effectiveness in tasks like emotion recognition and stuttering detection. In this paper, we employ large pre-trained models for the ACM Multimedia Computational Paralinguistics Challenge, addressing the Requests and Emotion Share tasks. We explore audio-only and hybrid solutions leveraging audio and text modalities. Ou… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted at ACMM 2023

  6. arXiv:2307.11450  [pdf, other

    eess.AS cs.CL

    Topic Identification For Spontaneous Speech: Enriching Audio Features With Embedded Linguistic Information

    Authors: Dejan Porjazovski, Tamás Grósz, Mikko Kurimo

    Abstract: Traditional topic identification solutions from audio rely on an automatic speech recognition system (ASR) to produce transcripts used as input to a text-based model. These approaches work well in high-resource scenarios, where there are sufficient data to train both components of the pipeline. However, in low-resource situations, the ASR system, even if available, produces low-quality transcripts… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    Comments: Accepted to EUSIPCO 2023

  7. arXiv:2210.15978  [pdf, other

    eess.AS cs.SD

    End-to-end Ensemble-based Feature Selection for Paralinguistics Tasks

    Authors: Tamás Grósz, Mittul Singh, Sudarsana Reddy Kadiri, Hemant Kathania, Mikko Kurimo

    Abstract: The events of recent years have highlighted the importance of telemedicine solutions which could potentially allow remote treatment and diagnosis. Relatedly, Computational Paralinguistics, a unique subfield of Speech Processing, aims to extract information about the speaker and form an important part of telemedicine applications. In this work, we focus on two paralinguistic problems: mask detectio… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  8. arXiv:2208.05782  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Comparison and Analysis of New Curriculum Criteria for End-to-End ASR

    Authors: Georgios Karakasidis, Tamás Grósz, Mikko Kurimo

    Abstract: It is common knowledge that the quantity and quality of the training data play a significant role in the creation of a good machine learning model. In this paper, we take it one step further and demonstrate that the way the training examples are arranged is also of crucial importance. Curriculum Learning is built on the observation that organized and structured assimilation of knowledge has the ab… ▽ More

    Submitted 10 August, 2022; originally announced August 2022.

    Comments: 5 pages, 2 figures, in Proceedings Interspeech 2022

    ACM Class: I.2.7; I.2.0

  9. arXiv:2203.14876  [pdf, other

    cs.CL cs.SD eess.AS

    Finnish Parliament ASR corpus - Analysis, benchmarks and statistics

    Authors: Anja Virkkunen, Aku Rouhe, Nhan Phan, Mikko Kurimo

    Abstract: Public sources like parliament meeting recordings and transcripts provide ever-growing material for the training and evaluation of automatic speech recognition (ASR) systems. In this paper, we publish and analyse the Finnish parliament ASR corpus, the largest publicly available collection of manually transcribed speech data for Finnish with over 3000 hours of speech and 449 speakers for which it p… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

    Comments: Submitted to Language Resources and Evaluation

  10. arXiv:2203.12906  [pdf, other

    cs.CL eess.AS

    Lahjoita puhetta -- a large-scale corpus of spoken Finnish with some benchmarks

    Authors: Anssi Moisio, Dejan Porjazovski, Aku Rouhe, Yaroslav Getman, Anja Virkkunen, Tamás Grósz, Krister Lindén, Mikko Kurimo

    Abstract: The Donate Speech campaign has so far succeeded in gathering approximately 3600 hours of ordinary, colloquial Finnish speech into the Lahjoita puhetta (Donate Speech) corpus. The corpus includes over twenty thousand speakers from all the regions of Finland and from all age brackets. The primary goals of the collection were to create a representative, large-scale resource to study spontaneous spoke… ▽ More

    Submitted 24 March, 2022; originally announced March 2022.

    Comments: Submitted to Language Resources and Evaluation

  11. arXiv:2008.12914  [pdf, other

    eess.AS cs.CL

    Data augmentation using prosody and false starts to recognize non-native children's speech

    Authors: Hemant Kathania, Mittul Singh, Tamás Grósz, Mikko Kurimo

    Abstract: This paper describes AaltoASR's speech recognition system for the INTERSPEECH 2020 shared task on Automatic Speech Recognition (ASR) for non-native children's speech. The task is to recognize non-native speech from children of various age groups given a limited amount of speech. Moreover, the speech being spontaneous has false starts transcribed as partial words, which in the test transcriptions l… ▽ More

    Submitted 29 August, 2020; originally announced August 2020.

  12. arXiv:2008.02689  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Aalto's End-to-End DNN systems for the INTERSPEECH 2020 Computational Paralinguistics Challenge

    Authors: Tamás Grósz, Mittul Singh, Sudarsana Reddy Kadiri, Hemant Kathania, Mikko Kurimo

    Abstract: End-to-end neural network models (E2E) have shown significant performance benefits on different INTERSPEECH ComParE tasks. Prior work has applied either a single instance of an E2E model for a task or the same E2E architecture for different tasks. However, applying a single model is unstable or using the same architecture under-utilizes task-specific information. On ComParE 2020 tasks, we investig… ▽ More

    Submitted 6 August, 2020; originally announced August 2020.

  13. Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search

    Authors: Mittul Singh, Sami Virpioja, Peter Smit, Mikko Kurimo

    Abstract: In spoken Keyword Search, the query may contain out-of-vocabulary (OOV) words not observed when training the speech recognition system. Using subword language models (LMs) in the first-pass recognition makes it possible to recognize the OOV words, but even the subword n-gram LMs suffer from data sparsity. Recurrent Neural Network (RNN) LMs alleviate the sparsity problems but are not suitable for f… ▽ More

    Submitted 10 September, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

    Comments: INTERSPEECH 2019

  14. arXiv:2003.11562  [pdf, other

    cs.CL cs.LG cs.SD eess.AS stat.ML

    Finnish Language Modeling with Deep Transformer Models

    Authors: Abhilash Jain, Aku Ruohe, Stig-Arne Grönroos, Mikko Kurimo

    Abstract: Transformers have recently taken the center stage in language modeling after LSTM's were considered the dominant model architecture for a long time. In this project, we investigate the performance of the Transformer architectures-BERT and Transformer-XL for the language modeling task. We use a sub-word model setting with the Finnish language and compare it to the previous State of the art (SOTA) L… ▽ More

    Submitted 27 March, 2020; v1 submitted 14 March, 2020; originally announced March 2020.

    Comments: 4 pages

  15. arXiv:1905.02639  [pdf, other

    eess.AS cs.SD

    Transparent pronunciation scoring using articulatorily weighted phoneme edit distance

    Authors: Reima Karhila, Anna-Riikka Smolander, Sari Ylinen, Mikko Kurimo

    Abstract: For researching effects of gamification in foreign language learning for children in the "Say It Again, Kid!" project we developed a feedback paradigm that can drive gameplay in pronunciation learning games. We describe our scoring system based on the difference between a reference phone sequence and the output of a multilingual CTC phoneme recogniser. We present a white-box scoring model of mappe… ▽ More

    Submitted 7 May, 2019; originally announced May 2019.

    Comments: Submitted to Interspeech 2019