Search | arXiv e-print repository

arXiv:2505.20006 [pdf, ps, other]

Mixture of LoRA Experts for Low-Resourced Multi-Accent Automatic Speech Recognition

Authors: Raphaël Bagat, Irina Illina, Emmanuel Vincent

Abstract: We aim to improve the robustness of Automatic Speech Recognition (ASR) systems against non-native speech, particularly in low-resourced multi-accent settings. We introduce Mixture of Accent-Specific LoRAs (MAS-LoRA), a fine-tuning method that leverages a mixture of Low-Rank Adaptation (LoRA) experts, each specialized in a specific accent. This method can be used when the accent is known or unknown… ▽ More We aim to improve the robustness of Automatic Speech Recognition (ASR) systems against non-native speech, particularly in low-resourced multi-accent settings. We introduce Mixture of Accent-Specific LoRAs (MAS-LoRA), a fine-tuning method that leverages a mixture of Low-Rank Adaptation (LoRA) experts, each specialized in a specific accent. This method can be used when the accent is known or unknown at inference time, without the need to fine-tune the model again. Our experiments, conducted using Whisper on the L2-ARCTIC corpus, demonstrate significant improvements in Word Error Rate compared to regular LoRA and full fine-tuning when the accent is unknown. When the accent is known, the results further improve. Furthermore, MAS-LoRA shows less catastrophic forgetting than the other fine-tuning methods. To the best of our knowledge, this is the first use of a mixture of LoRA experts for non-native multi-accent ASR. △ Less

Submitted 26 May, 2025; originally announced May 2025.

Comments: Submitted to Interspeech 2025

arXiv:2412.16719 [pdf, other]

Lillama: Large Language Models Compression via Low-Rank Feature Distillation

Authors: Yaya Sy, Christophe Cerisara, Irina Illina

Abstract: Current LLM structured pruning methods typically involve two steps: (1) compression with calibration data and (2) costly continued pretraining on billions of tokens to recover lost performance. This second step is necessary as the first significantly impacts model accuracy. Prior research suggests pretrained Transformer weights aren't inherently low-rank, unlike their activations, which may explai… ▽ More Current LLM structured pruning methods typically involve two steps: (1) compression with calibration data and (2) costly continued pretraining on billions of tokens to recover lost performance. This second step is necessary as the first significantly impacts model accuracy. Prior research suggests pretrained Transformer weights aren't inherently low-rank, unlike their activations, which may explain this drop. Based on this observation, we propose Lillama, a compression method that locally distills activations with low-rank weights. Using SVD for initialization and a joint loss combining teacher and student activations, we accelerate convergence and reduce memory use with local gradient updates. Lillama compresses Mixtral-8x7B within minutes on a single A100 GPU, removing 10 billion parameters while retaining over 95% of its original performance. Phi-2 3B can be compressed by 40% with just 13 million calibration tokens, resulting in a small model that competes with recent models of similar size. The method generalizes well to non-transformer architectures, compressing Mamba-3B by 20% while maintaining 99% performance. △ Less

Submitted 28 December, 2024; v1 submitted 21 December, 2024; originally announced December 2024.

Comments: 20 pages, 8 figures

arXiv:2307.16582 [pdf, other]

SAMbA: Speech enhancement with Asynchronous ad-hoc Microphone Arrays

Authors: Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina

Abstract: Speech enhancement in ad-hoc microphone arrays is often hindered by the asynchronization of the devices composing the microphone array. Asynchronization comes from sampling time offset and sampling rate offset which inevitably occur when the microphones are embedded in different hardware components. In this paper, we propose a deep neural network (DNN)-based speech enhancement solution that is sui… ▽ More Speech enhancement in ad-hoc microphone arrays is often hindered by the asynchronization of the devices composing the microphone array. Asynchronization comes from sampling time offset and sampling rate offset which inevitably occur when the microphones are embedded in different hardware components. In this paper, we propose a deep neural network (DNN)-based speech enhancement solution that is suited for applications in ad-hoc microphone arrays because it is distributed and copes with asynchronization. We show that asynchronization has a limited impact on the spatial filtering and mostly affects the performance of the DNNs. Instead of resynchronising the signals, which requires costly processing steps, we use an attention mechanism which makes the DNNs, thus our whole pipeline, robust to asynchronization. We also show that the attention mechanism leads to the asynchronization parameters in an unsupervised manner. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: Submitted to INTERSPEECH 2022

arXiv:2210.09340 [pdf, other]

Transferring Knowledge via Neighborhood-Aware Optimal Transport for Low-Resource Hate Speech Detection

Authors: Tulika Bose, Irina Illina, Dominique Fohr

Abstract: The concerning rise of hateful content on online platforms has increased the attention towards automatic hate speech detection, commonly formulated as a supervised classification task. State-of-the-art deep learning-based approaches usually require a substantial amount of labeled resources for training. However, annotating hate speech resources is expensive, time-consuming, and often harmful to th… ▽ More The concerning rise of hateful content on online platforms has increased the attention towards automatic hate speech detection, commonly formulated as a supervised classification task. State-of-the-art deep learning-based approaches usually require a substantial amount of labeled resources for training. However, annotating hate speech resources is expensive, time-consuming, and often harmful to the annotators. This creates a pressing need to transfer knowledge from the existing labeled resources to low-resource hate speech corpora with the goal of improving system performance. For this, neighborhood-based frameworks have been shown to be effective. However, they have limited flexibility. In our paper, we propose a novel training strategy that allows flexible modeling of the relative proximity of neighbors retrieved from a resource-rich corpus to learn the amount of transfer. In particular, we incorporate neighborhood information with Optimal Transport, which permits exploiting the geometry of the data embedding space. By aligning the joint embedding and label distributions of neighbors, we demonstrate substantial improvements over strong baselines, in low-resource scenarios, on different publicly available hate speech corpora. △ Less

Submitted 17 October, 2022; originally announced October 2022.

Comments: AACL-IJCNLP 2022 preprint

arXiv:2209.08681 [pdf, other]

Domain Classification-based Source-specific Term Penalization for Domain Adaptation in Hate-speech Detection

Authors: Tulika Bose, Nikolaos Aletras, Irina Illina, Dominique Fohr

Abstract: State-of-the-art approaches for hate-speech detection usually exhibit poor performance in out-of-domain settings. This occurs, typically, due to classifiers overemphasizing source-specific information that negatively impacts its domain invariance. Prior work has attempted to penalize terms related to hate-speech from manually curated lists using feature attribution methods, which quantify the impo… ▽ More State-of-the-art approaches for hate-speech detection usually exhibit poor performance in out-of-domain settings. This occurs, typically, due to classifiers overemphasizing source-specific information that negatively impacts its domain invariance. Prior work has attempted to penalize terms related to hate-speech from manually curated lists using feature attribution methods, which quantify the importance assigned to input terms by the classifier when making a prediction. We, instead, propose a domain adaptation approach that automatically extracts and penalizes source-specific terms using a domain classifier, which learns to differentiate between domains, and feature-attribution scores for hate-speech classes, yielding consistent improvements in cross-domain evaluation. △ Less

Submitted 18 September, 2022; originally announced September 2022.

Comments: COLING 2022 pre-print

arXiv:2204.13400 [pdf, other]

Placing M-Phasis on the Plurality of Hate: A Feature-Based Corpus of Hate Online

Authors: Dana Ruiter, Liane Reiners, Ashwin Geet D'Sa, Thomas Kleinbauer, Dominique Fohr, Irina Illina, Dietrich Klakow, Christian Schemer, Angeliki Monnier

Abstract: Even though hate speech (HS) online has been an important object of research in the last decade, most HS-related corpora over-simplify the phenomenon of hate by attempting to label user comments as "hate" or "neutral". This ignores the complex and subjective nature of HS, which limits the real-life applicability of classifiers trained on these corpora. In this study, we present the M-Phasis corpus… ▽ More Even though hate speech (HS) online has been an important object of research in the last decade, most HS-related corpora over-simplify the phenomenon of hate by attempting to label user comments as "hate" or "neutral". This ignores the complex and subjective nature of HS, which limits the real-life applicability of classifiers trained on these corpora. In this study, we present the M-Phasis corpus, a corpus of ~9k German and French user comments collected from migration-related news articles. It goes beyond the "hate"-"neutral" dichotomy and is instead annotated with 23 features, which in combination become descriptors of various types of speech, ranging from critical comments to implicit and explicit expressions of hate. The annotations are performed by 4 native speakers per language and achieve high (0.77 <= k <= 1) inter-annotator agreements. Besides describing the corpus creation and presenting insights from a content, error and domain analysis, we explore its data characteristics by training several classification baselines. △ Less

Submitted 28 April, 2022; originally announced April 2022.

Comments: 14 pages, 4 figures, accepted at LREC 2022 (Full Paper)

arXiv:2203.12536 [pdf, other]

Dynamically Refined Regularization for Improving Cross-corpora Hate Speech Detection

Authors: Tulika Bose, Nikolaos Aletras, Irina Illina, Dominique Fohr

Abstract: Hate speech classifiers exhibit substantial performance degradation when evaluated on datasets different from the source. This is due to learning spurious correlations between words that are not necessarily relevant to hateful language, and hate speech labels from the training corpus. Previous work has attempted to mitigate this problem by regularizing specific terms from pre-defined static dictio… ▽ More Hate speech classifiers exhibit substantial performance degradation when evaluated on datasets different from the source. This is due to learning spurious correlations between words that are not necessarily relevant to hateful language, and hate speech labels from the training corpus. Previous work has attempted to mitigate this problem by regularizing specific terms from pre-defined static dictionaries. While this has been demonstrated to improve the generalizability of classifiers, the coverage of such methods is limited and the dictionaries require regular manual updates from human experts. In this paper, we propose to automatically identify and reduce spurious correlations using attribution methods with dynamic refinement of the list of terms that need to be regularized during training. Our approach is flexible and improves the cross-corpora performance over previous work independently and in combination with pre-defined dictionaries. △ Less

Submitted 23 March, 2022; originally announced March 2022.

Comments: Findings of ACL 2022 preprint

arXiv:2106.07939 [pdf, other]

Attention-based distributed speech enhancement for unconstrained microphone arrays with varying number of nodes

Authors: Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina

Abstract: Speech enhancement promises higher efficiency in ad-hoc microphone arrays than in constrained microphone arrays thanks to the wide spatial coverage of the devices in the acoustic scene. However, speech enhancement in ad-hoc microphone arrays still raises many challenges. In particular, the algorithms should be able to handle a variable number of microphones, as some devices in the array might appe… ▽ More Speech enhancement promises higher efficiency in ad-hoc microphone arrays than in constrained microphone arrays thanks to the wide spatial coverage of the devices in the acoustic scene. However, speech enhancement in ad-hoc microphone arrays still raises many challenges. In particular, the algorithms should be able to handle a variable number of microphones, as some devices in the array might appear or disappear. In this paper, we propose a solution that can efficiently process the spatial information captured by the different devices of the microphone array, while being robust to a link failure. To do this, we use an attention mechanism in order to put more weight on the relevant signals sent throughout the array and to neglect the redundant or empty channels. △ Less

Submitted 15 June, 2021; originally announced June 2021.

Journal ref: European Signal Processing Conference (EUSIPCO), IEEE, Aug 2021, Dublin, Ireland

arXiv:2106.00237 [pdf, other]

Improving Automatic Hate Speech Detection with Multiword Expression Features

Authors: Nicolas Zampieri, Irina Illina, Dominique Fohr

Abstract: The task of automatically detecting hate speech in social media is gaining more and more attention. Given the enormous volume of content posted daily, human monitoring of hate speech is unfeasible. In this work, we propose new word-level features for automatic hate speech detection (HSD): multiword expressions (MWEs). MWEs are lexical units greater than a word that have idiomatic and compositional… ▽ More The task of automatically detecting hate speech in social media is gaining more and more attention. Given the enormous volume of content posted daily, human monitoring of hate speech is unfeasible. In this work, we propose new word-level features for automatic hate speech detection (HSD): multiword expressions (MWEs). MWEs are lexical units greater than a word that have idiomatic and compositional meanings. We propose to integrate MWE features in a deep neural network-based HSD framework. Our baseline HSD system relies on Universal Sentence Encoder (USE). To incorporate MWE features, we create a three-branch deep neural network: one branch for USE, one for MWE categories, and one for MWE embeddings. We conduct experiments on two hate speech tweet corpora with different MWE categories and with two types of MWE embeddings, word2vec and BERT. Our experiments demonstrate that the proposed HSD system with MWE features significantly outperforms the baseline system in terms of macro-F1. △ Less

Submitted 1 June, 2021; originally announced June 2021.

Comments: In Proceedings of NLDB 2021

arXiv:2011.01714 [pdf, other]

DNN-based mask estimation for distributed speech enhancement in spatially unconstrained microphone arrays

Authors: Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid

Abstract: Deep neural network (DNN)-based speech enhancement algorithms in microphone arrays have now proven to be efficient solutions to speech understanding and speech recognition in noisy environments. However, in the context of ad-hoc microphone arrays, many challenges remain and raise the need for distributed processing. In this paper, we propose to extend a previously introduced distributed DNN-based… ▽ More Deep neural network (DNN)-based speech enhancement algorithms in microphone arrays have now proven to be efficient solutions to speech understanding and speech recognition in noisy environments. However, in the context of ad-hoc microphone arrays, many challenges remain and raise the need for distributed processing. In this paper, we propose to extend a previously introduced distributed DNN-based time-frequency mask estimation scheme that can efficiently use spatial information in form of so-called compressed signals which are pre-filtered target estimations. We study the performance of this algorithm under realistic acoustic conditions and investigate practical aspects of its optimal application. We show that the nodes in the microphone array cooperate by taking profit of their spatial coverage in the room. We also propose to use the compressed signals not only to convey the target estimation but also the noise estimation in order to exploit the acoustic diversity recorded throughout the microphone array. △ Less

Submitted 3 November, 2020; originally announced November 2020.

Comments: Submitted to TASLP

arXiv:2011.00982 [pdf, other]

Distributed speech separation in spatially unconstrained microphone arrays

Authors: Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid

Abstract: Speech separation with several speakers is a challenging task because of the non-stationarity of the speech and the strong signal similarity between interferent sources. Current state-of-the-art solutions can separate well the different sources using sophisticated deep neural networks which are very tedious to train. When several microphones are available, spatial information can be exploited to d… ▽ More Speech separation with several speakers is a challenging task because of the non-stationarity of the speech and the strong signal similarity between interferent sources. Current state-of-the-art solutions can separate well the different sources using sophisticated deep neural networks which are very tedious to train. When several microphones are available, spatial information can be exploited to design much simpler algorithms to discriminate speakers. We propose a distributed algorithm that can process spatial information in a spatially unconstrained microphone array. The algorithm relies on a convolutional recurrent neural network that can exploit the signal diversity from the distributed nodes. In a typical case of a meeting room, this algorithm can capture an estimate of each source in a first step and propagate it over the microphone array in order to increase the separation performance in a second step. We show that this approach performs even better when the number of sources and nodes increases. We also study the influence of a mismatch in the number of sources between the training and testing conditions. △ Less

Submitted 8 February, 2021; v1 submitted 2 November, 2020; originally announced November 2020.

Journal ref: ICASSP 2021 - 46th International Conference on Acoustics, Speech, and Signal Processing, Jun 2021, Toronto, Canada

arXiv:2011.00975 [pdf]

DNN-Based Semantic Model for Rescoring N-best Speech Recognition List

Authors: Dominique Fohr, Irina Illina

Abstract: The word error rate (WER) of an automatic speech recognition (ASR) system increases when a mismatch occurs between the training and the testing conditions due to the noise, etc. In this case, the acoustic information can be less reliable. This work aims to improve ASR by modeling long-term semantic relations to compensate for distorted acoustic features. We propose to perform this through rescorin… ▽ More The word error rate (WER) of an automatic speech recognition (ASR) system increases when a mismatch occurs between the training and the testing conditions due to the noise, etc. In this case, the acoustic information can be less reliable. This work aims to improve ASR by modeling long-term semantic relations to compensate for distorted acoustic features. We propose to perform this through rescoring of the ASR N-best hypotheses list. To achieve this, we train a deep neural network (DNN). Our DNN rescoring model is aimed at selecting hypotheses that have better semantic consistency and therefore lower WER. We investigate two types of representations as part of input features to our DNN model: static word embeddings (from word2vec) and dynamic contextual embeddings (from BERT). Acoustic and linguistic features are also included. We perform experiments on the publicly available dataset TED-LIUM mixed with real noise. The proposed rescoring approaches give significant improvement of the WER over the ASR system without rescoring models in two noisy conditions and with n-gram and RNNLM. △ Less

Submitted 2 November, 2020; originally announced November 2020.

arXiv:2002.06016 [pdf, other]

DNN-Based Distributed Multichannel Mask Estimation for Speech Enhancement in Microphone Arrays

Authors: Nicolas Furnon, Romain Serizel, Irina Illina, Slim Essid

Abstract: Multichannel processing is widely used for speech enhancement but several limitations appear when trying to deploy these solutions to the real-world. Distributed sensor arrays that consider several devices with a few microphones is a viable alternative that allows for exploiting the multiple devices equipped with microphones that we are using in our everyday life. In this context, we propose to ex… ▽ More Multichannel processing is widely used for speech enhancement but several limitations appear when trying to deploy these solutions to the real-world. Distributed sensor arrays that consider several devices with a few microphones is a viable alternative that allows for exploiting the multiple devices equipped with microphones that we are using in our everyday life. In this context, we propose to extend the distributed adaptive node-specific signal estimation approach to a neural networks framework. At each node, a local filtering is performed to send one signal to the other nodes where a mask is estimated by a neural network in order to compute a global multi-channel Wiener filter. In an array of two nodes, we show that this additional signal can be efficiently taken into account to predict the masks and leads to better speech enhancement performances than when the mask estimation relies only on the local signals. △ Less

Submitted 16 March, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

Comments: Submitted to ICASSP2020

Journal ref: International Conference on Audio, Signal and Speech Processing (ICASSP), May 2020, Barcelone, Spain

arXiv:1911.08395 [pdf]

Towards non-toxic landscapes: Automatic toxic comment detection using DNN

Authors: Ashwin Geet D'Sa, Irina Illina, Dominique Fohr

Abstract: The spectacular expansion of the Internet has led to the development of a new research problem in the field of natural language processing: automatic toxic comment detection, since many countries prohibit hate speech in public media. There is no clear and formal definition of hate, offensive, toxic and abusive speeches. In this article, we put all these terms under the umbrella of "toxic" speech.… ▽ More The spectacular expansion of the Internet has led to the development of a new research problem in the field of natural language processing: automatic toxic comment detection, since many countries prohibit hate speech in public media. There is no clear and formal definition of hate, offensive, toxic and abusive speeches. In this article, we put all these terms under the umbrella of "toxic" speech. The contribution of this paper is the design of binary classification and regression-based approaches aiming to predict whether a comment is toxic or not. We compare different unsupervised word representations and different DNN based classifiers. Moreover, we study the robustness of the proposed approaches to adversarial attacks by adding one (healthy or toxic) word. We evaluate the proposed methodology on the English Wikipedia Detox corpus. Our experiments show that using BERT fine-tuning outperforms feature-based BERT, Mikolov's and fastText representations with different DNN classifiers. △ Less

Submitted 16 September, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

Journal ref: In Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying 2020 May (pp. 21-25)

arXiv:1511.05389 [pdf, other]

Learning to retrieve out-of-vocabulary words in speech recognition

Authors: Imran Sheikh, Irina Illina, Dominique Fohr, Georges Linarès

Abstract: Many Proper Names (PNs) are Out-Of-Vocabulary (OOV) words for speech recognition systems used to process diachronic audio data. To help recovery of the PNs missed by the system, relevant OOV PNs can be retrieved out of the many OOVs by exploiting semantic context of the spoken content. In this paper, we propose two neural network models targeted to retrieve OOV PNs relevant to an audio document: (… ▽ More Many Proper Names (PNs) are Out-Of-Vocabulary (OOV) words for speech recognition systems used to process diachronic audio data. To help recovery of the PNs missed by the system, relevant OOV PNs can be retrieved out of the many OOVs by exploiting semantic context of the spoken content. In this paper, we propose two neural network models targeted to retrieve OOV PNs relevant to an audio document: (a) Document level Continuous Bag of Words (D-CBOW), (b) Document level Continuous Bag of Weighted Words (D-CBOW2). Both these models take document words as input and learn with an objective to maximise the retrieval of co-occurring OOV PNs. With the D-CBOW2 model we propose a new approach in which the input embedding layer is augmented with a context anchor layer. This layer learns to assign importance to input words and has the ability to capture (task specific) key-words in a bag-of-word neural network model. With experiments on French broadcast news videos we show that these two models outperform the baseline methods based on raw embeddings from LDA, Skip-gram and Paragraph Vectors. Combining the D-CBOW and D-CBOW2 models gives faster convergence during training. △ Less

Submitted 1 March, 2016; v1 submitted 17 November, 2015; originally announced November 2015.

Comments: Updated references, added appendix discussing more results; added more discussion, replaced simple phone search results with KWS results; added KWS results for both training phase, probably last update

arXiv:0711.1038 [pdf, ps, other]

Amélioration des Performances des Systèmes Automatiques de Reconnaissance de la Parole pour la Parole Non Native

Authors: Ghazi Bouselmi, Dominique Fohr, Irina Illina, Jean-Paul Haton

Abstract: In this article, we present an approach for non native automatic speech recognition (ASR). We propose two methods to adapt existing ASR systems to the non-native accents. The first method is based on the modification of acoustic models through integration of acoustic models from the mother tong. The phonemes of the target language are pronounced in a similar manner to the native language of spea… ▽ More In this article, we present an approach for non native automatic speech recognition (ASR). We propose two methods to adapt existing ASR systems to the non-native accents. The first method is based on the modification of acoustic models through integration of acoustic models from the mother tong. The phonemes of the target language are pronounced in a similar manner to the native language of speakers. We propose to combine the models of confused phonemes so that the ASR system could recognize both concurrent pronounciations. The second method we propose is a refinment of the pronounciation error detection through the introduction of graphemic constraints. Indeed, non native speakers may rely on the writing of words in their uttering. Thus, the pronounctiation errors might depend on the characters composing the words. The average error rate reduction that we observed is (22.5%) relative for the sentence error rate, and 34.5% (relative) in word error rate. △ Less

Submitted 7 November, 2007; originally announced November 2007.

Journal ref: Dans TAIMA'07, Traitement et Analyse de l'Information : Méthodes et Applications (2007)

arXiv:0711.0811 [pdf, ps, other]

Combined Acoustic and Pronunciation Modelling for Non-Native Speech Recognition

Authors: Ghazi Bouselmi, Dominique Fohr, Irina Illina

Abstract: In this paper, we present several adaptation methods for non-native speech recognition. We have tested pronunciation modelling, MLLR and MAP non-native pronunciation adaptation and HMM models retraining on the HIWIRE foreign accented English speech database. The ``phonetic confusion'' scheme we have developed consists in associating to each spoken phone several sequences of confused phones. In o… ▽ More In this paper, we present several adaptation methods for non-native speech recognition. We have tested pronunciation modelling, MLLR and MAP non-native pronunciation adaptation and HMM models retraining on the HIWIRE foreign accented English speech database. The ``phonetic confusion'' scheme we have developed consists in associating to each spoken phone several sequences of confused phones. In our experiments, we have used different combinations of acoustic models representing the canonical and the foreign pronunciations: spoken and native models, models adapted to the non-native accent with MAP and MLLR. The joint use of pronunciation modelling and acoustic adaptation led to further improvements in recognition accuracy. The best combination of the above mentioned techniques resulted in a relative word error reduction ranging from 46% to 71%. △ Less

Submitted 6 November, 2007; originally announced November 2007.

Journal ref: Dans InterSpeech 2007 (2007)

arXiv:0711.0666 [pdf, ps, other]

Discriminative Phoneme Sequences Extraction for Non-Native Speaker's Origin Classification

Authors: Ghazi Bouselmi, Dominique Fohr, Irina Illina, Jean-Paul Haton

Abstract: In this paper we present an automated method for the classification of the origin of non-native speakers. The origin of non-native speakers could be identified by a human listener based on the detection of typical pronunciations for each nationality. Thus we suppose the existence of several phoneme sequences that might allow the classification of the origin of non-native speakers. Our new method… ▽ More In this paper we present an automated method for the classification of the origin of non-native speakers. The origin of non-native speakers could be identified by a human listener based on the detection of typical pronunciations for each nationality. Thus we suppose the existence of several phoneme sequences that might allow the classification of the origin of non-native speakers. Our new method is based on the extraction of discriminative sequences of phonemes from a non-native English speech database. These sequences are used to construct a probabilistic classifier for the speakers' origin. The existence of discriminative phone sequences in non-native speech is a significant result of this work. The system that we have developed achieved a significant correct classification rate of 96.3% and a significant error reduction compared to some other tested techniques. △ Less

Submitted 5 November, 2007; originally announced November 2007.

Journal ref: Dans ISSPA, International Symposium on Signal Processing and its Applications (2007)

Showing 1–18 of 18 results for author: Illina, I