-
A Semi-Automatic Approach to Create Large Gender- and Age-Balanced Speaker Corpora: Usefulness of Speaker Diarization & Identification
Abstract: This paper presents a semi-automatic approach to create a diachronic corpus of voices balanced for speaker's age, gender, and recording period, according to 32 categories (2 genders, 4 age ranges and 4 recording periods). Corpora were selected at French National Institute of Audiovisual (INA) to obtain at least 30 speakers per category (a total of 960 speakers; only 874 have be found yet). For eac… ▽ More
Submitted 26 April, 2024; originally announced April 2024.
Comments: Keywords:, semi-automatic processing, corpus creation, diarization, speaker identification, gender-balanced, age-balanced, speaker corpus, diachrony
Journal ref: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 3271-3280, Marseille, 20-25 June 2022. European Language Resources Association (ELRA)