Search | arXiv e-print repository

Do self-supervised speech and language models extract similar representations as human brain?

Authors: Peili Chen, Linyang He, Li Fu, Lu Fan, Edward F. Chang, Yuanning Li

Abstract: Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception. However, given their distinct training modalities, it remains unclear whether they correlate with the same neural aspects. We directly address this question by evaluating the brain prediction performance of two representative SSL models,… ▽ More Speech and language models trained through self-supervised learning (SSL) demonstrate strong alignment with brain activity during speech and language perception. However, given their distinct training modalities, it remains unclear whether they correlate with the same neural aspects. We directly address this question by evaluating the brain prediction performance of two representative SSL models, Wav2Vec2.0 and GPT-2, designed for speech and language tasks. Our findings reveal that both models accurately predict speech responses in the auditory cortex, with a significant correlation between their brain predictions. Notably, shared speech contextual information between Wav2Vec2.0 and GPT-2 accounts for the majority of explained variance in brain activity, surpassing static semantic and lower-level acoustic-phonetic information. These results underscore the convergence of speech contextual representations in SSL models and their alignment with the neural network underlying speech perception, offering valuable insights into both SSL models and the neural basis of speech and language processing. △ Less

Submitted 31 January, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: To appear in 2024 IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2310.04644 [pdf, other]

Neural2Speech: A Transfer Learning Framework for Neural-Driven Speech Reconstruction

Authors: Jiawei Li, Chunxu Guo, Li Fu, Lu Fan, Edward F. Chang, Yuanning Li

Abstract: Reconstructing natural speech from neural activity is vital for enabling direct communication via brain-computer interfaces. Previous efforts have explored the conversion of neural recordings into speech using complex deep neural network (DNN) models trained on extensive neural recording data, which is resource-intensive under regular clinical constraints. However, achieving satisfactory performan… ▽ More Reconstructing natural speech from neural activity is vital for enabling direct communication via brain-computer interfaces. Previous efforts have explored the conversion of neural recordings into speech using complex deep neural network (DNN) models trained on extensive neural recording data, which is resource-intensive under regular clinical constraints. However, achieving satisfactory performance in reconstructing speech from limited-scale neural recordings has been challenging, mainly due to the complexity of speech representations and the neural data constraints. To overcome these challenges, we propose a novel transfer learning framework for neural-driven speech reconstruction, called Neural2Speech, which consists of two distinct training phases. First, a speech autoencoder is pre-trained on readily available speech corpora to decode speech waveforms from the encoded speech representations. Second, a lightweight adaptor is trained on the small-scale neural recordings to align the neural activity and the speech representation for decoding. Remarkably, our proposed Neural2Speech demonstrates the feasibility of neural-driven speech reconstruction even with only 20 minutes of intracranial data, which significantly outperforms existing baseline methods in terms of speech fidelity and intelligibility. △ Less

Submitted 31 January, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: To appear in 2024 IEEE International Conference on Acoustics, Speech and Signal Processing

arXiv:2009.14293 [pdf]

doi 10.1016/j.brs.2021.01.023

Evidence of state-dependence in the effectiveness of responsive neurostimulation for seizure modulation

Authors: Sharon Chiang, Ankit N. Khambhati, Emily T. Wang, Marina Vannucci, Edward F. Chang, Vikram R. Rao

Abstract: An implanted device for brain-responsive neurostimulation (RNS System) is approved as an effective treatment to reduce seizures in adults with medically-refractory focal epilepsy. Clinical trials of the RNS System demonstrate population-level reduction in average seizure frequency, but therapeutic response is highly variable. Recent evidence links seizures to cyclical fluctuations in underlying ri… ▽ More An implanted device for brain-responsive neurostimulation (RNS System) is approved as an effective treatment to reduce seizures in adults with medically-refractory focal epilepsy. Clinical trials of the RNS System demonstrate population-level reduction in average seizure frequency, but therapeutic response is highly variable. Recent evidence links seizures to cyclical fluctuations in underlying risk. We tested the hypothesis that effectiveness of responsive neurostimulation varies based on current state within cyclical risk fluctuations. We analyzed retrospective data from 25 adults with medically-refractory focal epilepsy implanted with the RNS System. Chronic electrocorticography was used to record electrographic seizures, and hidden Markov models decoded seizures into fluctuations in underlying risk. State-dependent associations of RNS System stimulation parameters with changes in risk were estimated. Higher charge density was associated with improved outcomes, both for remaining in a low seizure risk state and for transitioning from a high to a low seizure risk state. The effect of stimulation frequency depended on initial seizure risk state: when starting in a low risk state, higher stimulation frequencies were associated with remaining in a low risk state, but when starting in a high risk state, lower stimulation frequencies were associated with transition to a low risk state. Findings were consistent across bipolar and monopolar stimulation configurations. The impact of RNS on seizure frequency exhibits state-dependence, such that stimulation parameters which are effective in one seizure risk state may not be effective in another. These findings represent conceptual advances in understanding the therapeutic mechanism of RNS, and directly inform current practices of RNS tuning and the development of next-generation neurostimulation systems. △ Less

Submitted 18 February, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

Journal ref: Brain Stimulation (2021); 14(2):366-375

arXiv:1912.05869 [pdf, other]

On Neural Phone Recognition of Mixed-Source ECoG Signals

Authors: Ahmed Hussen Abdelaziz, Shuo-Yiin Chang, Nelson Morgan, Erik Edwards, Dorothea Kolossa, Dan Ellis, David A. Moses, Edward F. Chang

Abstract: The emerging field of neural speech recognition (NSR) using electrocorticography has recently attracted remarkable research interest for studying how human brains recognize speech in quiet and noisy surroundings. In this study, we demonstrate the utility of NSR systems to objectively prove the ability of human beings to attend to a single speech source while suppressing the interfering signals in… ▽ More The emerging field of neural speech recognition (NSR) using electrocorticography has recently attracted remarkable research interest for studying how human brains recognize speech in quiet and noisy surroundings. In this study, we demonstrate the utility of NSR systems to objectively prove the ability of human beings to attend to a single speech source while suppressing the interfering signals in a simulated cocktail party scenario. The experimental results show that the relative degradation of the NSR system performance when tested in a mixed-source scenario is significantly lower than that of automatic speech recognition (ASR). In this paper, we have significantly enhanced the performance of our recently published framework by using manual alignments for initialization instead of the flat start technique. We have also improved the NSR system performance by accounting for the possible transcription mismatch between the acoustic and neural signals. △ Less

Submitted 12 December, 2019; originally announced December 2019.

Comments: 5 pages, showing algorithms, results and references from our collaboration during a 2017 postdoc stay of the first author

arXiv:1909.01401 [pdf, other]

Brain2Char: A Deep Architecture for Decoding Text from Brain Recordings

Authors: Pengfei Sun, Gopala K. Anumanchipalli, Edward F. Chang

Abstract: Decoding language representations directly from the brain can enable new Brain-Computer Interfaces (BCI) for high bandwidth human-human and human-machine communication. Clinically, such technologies can restore communication in people with neurological conditions affecting their ability to speak. In this study, we propose a novel deep network architecture Brain2Char, for directly decoding text (sp… ▽ More Decoding language representations directly from the brain can enable new Brain-Computer Interfaces (BCI) for high bandwidth human-human and human-machine communication. Clinically, such technologies can restore communication in people with neurological conditions affecting their ability to speak. In this study, we propose a novel deep network architecture Brain2Char, for directly decoding text (specifically character sequences) from direct brain recordings (called Electrocorticography, ECoG). Brain2Char framework combines state-of-the-art deep learning modules --- 3D Inception layers for multiband spatiotemporal feature extraction from neural data and bidirectional recurrent layers, dilated convolution layers followed by language model weighted beam search to decode character sequences, optimizing a connectionist temporal classification (CTC) loss. Additionally, given the highly non-linear transformations that underlie the conversion of cortical function to character sequences, we perform regularizations on the network's latent representations motivated by insights into cortical encoding of speech production and artifactual aspects specific to ECoG data acquisition. To do this, we impose auxiliary losses on latent representations for articulatory movements, speech acoustics and session specific non-linearities. In 3 participants tested here, Brain2Char achieves 10.6\%, 8.5\% and 7.0\% Word Error Rates (WER) respectively on vocabulary sizes ranging from 1200 to 1900 words. Brain2Char also performs well when 2 participants silently mimed sentences. These results set a new state-of-the-art on decoding text from brain and demonstrate the potential of Brain2Char as a high-performance communication BCI. △ Less

Submitted 3 September, 2019; originally announced September 2019.

arXiv:1805.08889 [pdf, other]

Spiking Linear Dynamical Systems on Neuromorphic Hardware for Low-Power Brain-Machine Interfaces

Authors: David G. Clark, Jesse A. Livezey, Edward F. Chang, Kristofer E. Bouchard

Abstract: Neuromorphic architectures achieve low-power operation by using many simple spiking neurons in lieu of traditional hardware. Here, we develop methods for precise linear computations in spiking neural networks and use these methods to map the evolution of a linear dynamical system (LDS) onto an existing neuromorphic chip: IBM's TrueNorth. We analytically characterize, and numerically validate, the… ▽ More Neuromorphic architectures achieve low-power operation by using many simple spiking neurons in lieu of traditional hardware. Here, we develop methods for precise linear computations in spiking neural networks and use these methods to map the evolution of a linear dynamical system (LDS) onto an existing neuromorphic chip: IBM's TrueNorth. We analytically characterize, and numerically validate, the discrepancy between the spiking LDS state sequence and that of its non-spiking counterpart. These analytical results shed light on the multiway tradeoff between time, space, energy, and accuracy in neuromorphic computation. To demonstrate the utility of our work, we implemented a neuromorphic Kalman filter (KF) and used it for offline decoding of human vocal pitch from neural data. The neuromorphic KF could be used for low-power filtering in domains beyond neuroscience, such as navigation or robotics. △ Less

Submitted 5 June, 2018; v1 submitted 22 May, 2018; originally announced May 2018.

Comments: 23 pages, 8 figures; added reference, removed typo in Fig. 2

arXiv:1803.09807 [pdf, other]

doi 10.1371/journal.pcbi.1007091

Deep learning as a tool for neural data analysis: speech classification and cross-frequency coupling in human sensorimotor cortex

Authors: Jesse A. Livezey, Kristofer E. Bouchard, Edward F. Chang

Abstract: A fundamental challenge in neuroscience is to understand what structure in the world is represented in spatially distributed patterns of neural activity from multiple single-trial measurements. This is often accomplished by learning a simple, linear transformations between neural features and features of the sensory stimuli or motor task. While successful in some early sensory processing areas, li… ▽ More A fundamental challenge in neuroscience is to understand what structure in the world is represented in spatially distributed patterns of neural activity from multiple single-trial measurements. This is often accomplished by learning a simple, linear transformations between neural features and features of the sensory stimuli or motor task. While successful in some early sensory processing areas, linear mappings are unlikely to be ideal tools for elucidating nonlinear, hierarchical representations of higher-order brain areas during complex tasks, such as the production of speech by humans. Here, we apply deep networks to predict produced speech syllables from cortical surface electric potentials recorded from human sensorimotor cortex. We found that deep networks had higher decoding prediction accuracy compared to baseline models, and also exhibited greater improvements in accuracy with increasing dataset size. We further demonstrate that deep network's confusions revealed hierarchical latent structure in the neural data, which recapitulated the underlying articulatory nature of speech motor control. Finally, we used deep networks to compare task-relevant information in different neural frequency bands, and found that the high-gamma band contains the vast majority of information relevant for the speech prediction task, with little-to-no additional contribution from lower-frequencies. Together, these results demonstrate the utility of deep networks as a data analysis tool for neuroscience. △ Less

Submitted 26 March, 2018; originally announced March 2018.

Comments: 23 pages, 9 figures

arXiv:1505.00041 [pdf, ps, other]

Modeling neural activity at the ensemble level

Authors: Joaquin Rapela, Mark Kostuk, Peter F. Rowat, Tim Mullen, Edward F. Chang, Kristofer Bouchard

Abstract: Here we demonstrate that the activity of neural ensembles can be quantitatively modeled. We first show that an ensemble dynamical model (EDM) accurately approximates the distribution of voltages and average firing rate per neuron of a population of simulated integrate-and-fire neurons. EDMs are high-dimensional nonlinear dynamical models. To faciliate the estimation of their parameters we present… ▽ More Here we demonstrate that the activity of neural ensembles can be quantitatively modeled. We first show that an ensemble dynamical model (EDM) accurately approximates the distribution of voltages and average firing rate per neuron of a population of simulated integrate-and-fire neurons. EDMs are high-dimensional nonlinear dynamical models. To faciliate the estimation of their parameters we present a dimensionality reduction method and study its performance with simulated data. We then introduce and evaluate a maximum-likelihood method to estimate connectivity parameters in networks of EDMS. Finally, we show that this model an methods accurately approximate the high-gamma power evoked by pure tones in the auditory cortex of rodents. Overall, this article demonstrates that quantitatively modeling brain activity at the ensemble level is indeed possible, and opens the way to understanding the computations performed by neural ensembles, which could revolutionarize our understanding of brain function. △ Less

Submitted 3 September, 2015; v1 submitted 30 April, 2015; originally announced May 2015.

Showing 1–8 of 8 results for author: Chang, E F