Non-native Children's Automatic Speech Assessment Challenge (NOCASA)

Getman, Yaroslav; Grósz, Tamás; Kurimo, Mikko; Salvi, Giampiero

Computer Science > Computation and Language

arXiv:2504.20678 (cs)

[Submitted on 29 Apr 2025]

Title:Non-native Children's Automatic Speech Assessment Challenge (NOCASA)

Authors:Yaroslav Getman, Tamás Grósz, Mikko Kurimo, Giampiero Salvi

View PDF HTML (experimental)

Abstract:This paper presents the "Non-native Children's Automatic Speech Assessment" (NOCASA) - a data competition part of the IEEE MLSP 2025 conference. NOCASA challenges participants to develop new systems that can assess single-word pronunciations of young second language (L2) learners as part of a gamified pronunciation training app. To achieve this, several issues must be addressed, most notably the limited nature of available training data and the highly unbalanced distribution among the pronunciation level categories. To expedite the development, we provide a pseudo-anonymized training data (TeflonNorL2), containing 10,334 recordings from 44 speakers attempting to pronounce 205 distinct Norwegian words, human-rated on a 1 to 5 scale (number of stars that should be given in the game). In addition to the data, two already trained systems are released as official baselines: an SVM classifier trained on the ComParE_16 acoustic feature set and a multi-task wav2vec 2.0 model. The latter achieves the best performance on the challenge test set, with an unweighted average recall (UAR) of 36.37%.

Comments:	First draft of the baseline paper for the NOCASA competition (this https URL), 5 pages
Subjects:	Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2504.20678 [cs.CL]
	(or arXiv:2504.20678v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2504.20678

Submission history

From: Tamás Grósz [view email]
[v1] Tue, 29 Apr 2025 11:59:08 UTC (398 KB)

Computer Science > Computation and Language

Title:Non-native Children's Automatic Speech Assessment Challenge (NOCASA)

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Non-native Children's Automatic Speech Assessment Challenge (NOCASA)

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators