-
Silent versus modal multi-speaker speech recognition from ultrasound and video
Authors:
Manuel Sam Ribeiro,
Aciel Eshky,
Korin Richmond,
Steve Renals
Abstract:
We investigate multi-speaker speech recognition from ultrasound images of the tongue and video images of the lips. We train our systems on imaging data from modal speech, and evaluate on matched test sets of two speaking modes: silent and modal speech. We observe that silent speech recognition from imaging data underperforms compared to modal speech recognition, likely due to a speaking-mode misma…
▽ More
We investigate multi-speaker speech recognition from ultrasound images of the tongue and video images of the lips. We train our systems on imaging data from modal speech, and evaluate on matched test sets of two speaking modes: silent and modal speech. We observe that silent speech recognition from imaging data underperforms compared to modal speech recognition, likely due to a speaking-mode mismatch between training and testing. We improve silent speech recognition performance using techniques that address the domain mismatch, such as fMLLR and unsupervised model adaptation. We also analyse the properties of silent and modal speech in terms of utterance duration and the size of the articulatory space. To estimate the articulatory space, we compute the convex hull of tongue splines, extracted from ultrasound tongue images. Overall, we observe that the duration of silent speech is longer than that of modal speech, and that silent speech covers a smaller articulatory space than modal speech. Although these two properties are statistically significant across speaking modes, they do not directly correlate with word error rates from speech recognition.
△ Less
Submitted 27 February, 2021;
originally announced March 2021.
-
Exploiting ultrasound tongue imaging for the automatic detection of speech articulation errors
Authors:
Manuel Sam Ribeiro,
Joanne Cleland,
Aciel Eshky,
Korin Richmond,
Steve Renals
Abstract:
Speech sound disorders are a common communication impairment in childhood. Because speech disorders can negatively affect the lives and the development of children, clinical intervention is often recommended. To help with diagnosis and treatment, clinicians use instrumented methods such as spectrograms or ultrasound tongue imaging to analyse speech articulations. Analysis with these methods can be…
▽ More
Speech sound disorders are a common communication impairment in childhood. Because speech disorders can negatively affect the lives and the development of children, clinical intervention is often recommended. To help with diagnosis and treatment, clinicians use instrumented methods such as spectrograms or ultrasound tongue imaging to analyse speech articulations. Analysis with these methods can be laborious for clinicians, therefore there is growing interest in its automation. In this paper, we investigate the contribution of ultrasound tongue imaging for the automatic detection of speech articulation errors. Our systems are trained on typically developing child speech and augmented with a database of adult speech using audio and ultrasound. Evaluation on typically developing speech indicates that pre-training on adult speech and jointly using ultrasound and audio gives the best results with an accuracy of 86.9%. To evaluate on disordered speech, we collect pronunciation scores from experienced speech and language therapists, focusing on cases of velar fronting and gliding of /r/. The scores show good inter-annotator agreement for velar fronting, but not for gliding errors. For automatic velar fronting error detection, the best results are obtained when jointly using ultrasound and audio. The best system correctly detects 86.6% of the errors identified by experienced clinicians. Out of all the segments identified as errors by the best system, 73.2% match errors identified by clinicians. Results on automatic gliding detection are harder to interpret due to poor inter-annotator agreement, but appear promising. Overall findings suggest that automatic detection of speech articulation errors has potential to be integrated into ultrasound intervention software for automatically quantifying progress during speech therapy.
△ Less
Submitted 27 February, 2021;
originally announced March 2021.
-
The ontogeny of discourse structure mimics the development of literature
Authors:
Natalia Bezerra Mota,
Sylvia Pinheiro,
Mariano Sigman,
Diego Fernandez Slezak,
Guillermo Cecchi,
Mauro Copelli,
Sidarta Ribeiro
Abstract:
Discourse varies with age, education, psychiatric state and historical epoch, but the ontogenetic and cultural dynamics of discourse structure remain to be quantitatively characterized. To this end we investigated word graphs obtained from verbal reports of 200 subjects ages 2-58, and 676 literary texts spanning ~5,000 years. In healthy subjects, lexical diversity, graph size, and long-range recur…
▽ More
Discourse varies with age, education, psychiatric state and historical epoch, but the ontogenetic and cultural dynamics of discourse structure remain to be quantitatively characterized. To this end we investigated word graphs obtained from verbal reports of 200 subjects ages 2-58, and 676 literary texts spanning ~5,000 years. In healthy subjects, lexical diversity, graph size, and long-range recurrence departed from initial near-random levels through a monotonic asymptotic increase across ages, while short-range recurrence showed a corresponding decrease. These changes were explained by education and suggest a hierarchical development of discourse structure: short-range recurrence and lexical diversity stabilize after elementary school, but graph size and long-range recurrence only stabilize after high school. This gradual maturation was blurred in psychotic subjects, who maintained in adulthood a near-random structure. In literature, monotonic asymptotic changes over time were remarkable: While lexical diversity, long-range recurrence and graph size increased away from near-randomness, short-range recurrence declined, from above to below random levels. Bronze Age texts are structurally similar to childish or psychotic discourses, but subsequent texts converge abruptly to the healthy adult pattern around the onset of the Axial Age (800-200 BC), a period of pivotal cultural change. Thus, individually as well as historically, discourse maturation increases the range of word recurrence away from randomness.
△ Less
Submitted 27 December, 2016;
originally announced December 2016.
-
Quantifying word salad: The structural randomness of verbal reports predicts negative symptoms and Schizophrenia diagnosis 6 months later
Authors:
Natalia B. Mota,
Mauro Copelli,
Sidarta Ribeiro
Abstract:
Background: The precise quantification of negative symptoms is necessary to improve differential diagnosis and prognosis prediction in Schizophrenia. In chronic psychotic patients, the representation of verbal reports as word graphs provides automated sorting of schizophrenia, bipolar disorder and control groups based on the degree of speech connectedness. Here we aim to use machine learning to ve…
▽ More
Background: The precise quantification of negative symptoms is necessary to improve differential diagnosis and prognosis prediction in Schizophrenia. In chronic psychotic patients, the representation of verbal reports as word graphs provides automated sorting of schizophrenia, bipolar disorder and control groups based on the degree of speech connectedness. Here we aim to use machine learning to verify whether speech connectedness during first clinical contact can predict negative symptoms and Schizophrenia diagnosis six months later. Methods: PANSS scores and memory reports were collected from 21 patients undergoing first clinical contact for recent-onset psychosis and followed for 6 months to establish DSM-IV diagnosis, and 21 healthy controls. Each report was represented as a graph in which words corresponded to nodes, and node temporal succession corresponded to edges. Three connectedness attributes were extracted from each graph, z-scores to random graph distributions were measured, correlated with the PANSS negative subscale, combined into a single Fragmentation Index, and used for predictions. Findings: Random-like speech was prevalent among Schizophrenia patients (64% x 5% in Control group, p=0.0002). Connectedness explained 92% of the PANSS negative subscale variance (p=0.0001). The Fragmentation Index classified low versus high scores of PANSS negative subscale with 93% accuracy (AUC=1), predicted Schizophrenia diagnosis with 89% accuracy (AUC=0.89), and was validated in an independent cohort of chronic psychotic patients. Interpretation: The structural randomness of speech graph connectedness is increased in Schizophrenia. It provides a quantitative measurement of word salad as a Fragmentation Index that tightly correlates with negative symptoms and predicts Schizophrenia diagnosis during first clinical contact of recent-onset psychosis.
△ Less
Submitted 30 October, 2016; v1 submitted 26 October, 2016;
originally announced October 2016.
-
Spike Avalanches Exhibit Universal Dynamics across the Sleep-Wake Cycle
Authors:
Tiago L. Ribeiro,
Mauro Copelli,
Fábio Caixeta,
Hindiael Belchior,
Dante R. Chialvo,
Miguel A. L. Nicolelis,
Sidarta Ribeiro
Abstract:
Scale-invariant neuronal avalanches have been observed in cell cultures and slices as well as anesthetized and awake brains, suggesting that the brain operates near criticality, i.e. within a narrow margin between avalanche propagation and extinction. In theory, criticality provides many desirable features for the behaving brain, optimizing computational capabilities, information transmission, sen…
▽ More
Scale-invariant neuronal avalanches have been observed in cell cultures and slices as well as anesthetized and awake brains, suggesting that the brain operates near criticality, i.e. within a narrow margin between avalanche propagation and extinction. In theory, criticality provides many desirable features for the behaving brain, optimizing computational capabilities, information transmission, sensitivity to sensory stimuli and size of memory repertoires. However, a thorough characterization of neuronal avalanches in freely-behaving (FB) animals is still missing, thus raising doubts about their relevance for brain function. To address this issue, we employed chronically implanted multielectrode arrays (MEA) to record avalanches of spikes from the cerebral cortex (V1 and S1) and hippocampus (HP) of 14 rats, as they spontaneously traversed the wake-sleep cycle, explored novel objects or were subjected to anesthesia (AN). We then modeled spike avalanches to evaluate the impact of sparse MEA sampling on their statistics. We found that the size distribution of spike avalanches are well fit by lognormal distributions in FB animals, and by truncated power laws in the AN group. The FB data are also characterized by multiple key features compatible with criticality in the temporal domain, such as 1/f spectra and long-term correlations as measured by detrended fluctuation analysis. These signatures are very stable across waking, slow-wave sleep and rapid-eye-movement sleep, but collapse during anesthesia. Likewise, waiting time distributions obey a single scaling function during all natural behavioral states, but not during anesthesia. Results are equivalent for neuronal ensembles recorded from V1, S1 and HP. Altogether, the data provide a comprehensive link between behavior and brain criticality, revealing a unique scale-invariant regime of spike avalanches across all major behaviors.
△ Less
Submitted 10 January, 2011;
originally announced January 2011.
-
Mutual information in random Boolean models of regulatory networks
Authors:
Andre S. Ribeiro,
Stuart A. Kauffman,
Jason Lloyd-Price,
Björn Samuelsson,
Joshua E. S. Socolar
Abstract:
The amount of mutual information contained in time series of two elements gives a measure of how well their activities are coordinated. In a large, complex network of interacting elements, such as a genetic regulatory network within a cell, the average of the mutual information over all pairs <I> is a global measure of how well the system can coordinate its internal dynamics. We study this avera…
▽ More
The amount of mutual information contained in time series of two elements gives a measure of how well their activities are coordinated. In a large, complex network of interacting elements, such as a genetic regulatory network within a cell, the average of the mutual information over all pairs <I> is a global measure of how well the system can coordinate its internal dynamics. We study this average pairwise mutual information in random Boolean networks (RBNs) as a function of the distribution of Boolean rules implemented at each element, assuming that the links in the network are randomly placed. Efficient numerical methods for calculating <I> show that as the number of network nodes N approaches infinity, the quantity N<I> exhibits a discontinuity at parameter values corresponding to critical RBNs. For finite systems it peaks near the critical value, but slightly in the disordered regime for typical parameter variations. The source of high values of N<I> is the indirect correlations between pairs of elements from different long chains with a common starting point. The contribution from pairs that are directly linked approaches zero for critical networks and peaks deep in the disordered regime.
△ Less
Submitted 15 November, 2007; v1 submitted 24 July, 2007;
originally announced July 2007.