Search | arXiv e-print repository

Bridging Auditory Perception and Language Comprehension through MEG-Driven Encoding Models

Authors: Matteo Ciferri, Matteo Ferrante, Nicola Toschi

Abstract: Understanding the neural mechanisms behind auditory and linguistic processing is key to advancing cognitive neuroscience. In this study, we use Magnetoencephalography (MEG) data to analyze brain responses to spoken language stimuli. We develop two distinct encoding models: an audio-to-MEG encoder, which uses time-frequency decompositions (TFD) and wav2vec2 latent space representations, and a text-… ▽ More Understanding the neural mechanisms behind auditory and linguistic processing is key to advancing cognitive neuroscience. In this study, we use Magnetoencephalography (MEG) data to analyze brain responses to spoken language stimuli. We develop two distinct encoding models: an audio-to-MEG encoder, which uses time-frequency decompositions (TFD) and wav2vec2 latent space representations, and a text-to-MEG encoder, which leverages CLIP and GPT-2 embeddings. Both models successfully predict neural activity, demonstrating significant correlations between estimated and observed MEG signals. However, the text-to-MEG model outperforms the audio-based model, achieving higher Pearson Correlation (PC) score. Spatially, we identify that auditory-based embeddings (TFD and wav2vec2) predominantly activate lateral temporal regions, which are responsible for primary auditory processing and the integration of auditory signals. In contrast, textual embeddings (CLIP and GPT-2) primarily engage the frontal cortex, particularly Broca's area, which is associated with higher-order language processing, including semantic integration and language production, especially in the 8-30 Hz frequency range. The strong involvement of these regions suggests that auditory stimuli are processed through more direct sensory pathways, while linguistic information is encoded via networks that integrate meaning and cognitive control. Our results reveal distinct neural pathways for auditory and linguistic information processing, with higher encoding accuracy for text representations in the frontal regions. These insights refine our understanding of the brain's functional architecture in processing auditory and textual information, offering quantitative advancements in the modelling of neural responses to complex language stimuli. △ Less

Submitted 22 December, 2024; originally announced January 2025.

Comments: 10 pages, 4 figures, Accepted at ICLR2024 Workshop TS4H

arXiv:2406.15537 [pdf, other]

R&B -- Rhythm and Brain: Cross-subject Decoding of Music from Human Brain Activity

Authors: Matteo Ferrante, Matteo Ciferri, Nicola Toschi

Abstract: Music is a universal phenomenon that profoundly influences human experiences across cultures. This study investigates whether music can be decoded from human brain activity measured with functional MRI (fMRI) during its perception. Leveraging recent advancements in extensive datasets and pre-trained computational models, we construct mappings between neural data and latent representations of music… ▽ More Music is a universal phenomenon that profoundly influences human experiences across cultures. This study investigates whether music can be decoded from human brain activity measured with functional MRI (fMRI) during its perception. Leveraging recent advancements in extensive datasets and pre-trained computational models, we construct mappings between neural data and latent representations of musical stimuli. Our approach integrates functional and anatomical alignment techniques to facilitate cross-subject decoding, addressing the challenges posed by the low temporal resolution and signal-to-noise ratio (SNR) in fMRI data. Starting from the GTZan fMRI dataset, where five participants listened to 540 musical stimuli from 10 different genres while their brain activity was recorded, we used the CLAP (Contrastive Language-Audio Pretraining) model to extract latent representations of the musical stimuli and developed voxel-wise encoding models to identify brain regions responsive to these stimuli. By applying a threshold to the association between predicted and actual brain activity, we identified specific regions of interest (ROIs) which can be interpreted as key players in music processing. Our decoding pipeline, primarily retrieval-based, employs a linear map to project brain activity to the corresponding CLAP features. This enables us to predict and retrieve the musical stimuli most similar to those that originated the fMRI data. Our results demonstrate state-of-the-art identification accuracy, with our methods significantly outperforming existing approaches. Our findings suggest that neural-based music retrieval systems could enable personalized recommendations and therapeutic applications. Future work could use higher temporal resolution neuroimaging and generative models to improve decoding accuracy and explore the neural underpinnings of music perception and emotion. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: The first two authors contributed equally to this work

arXiv:2402.07242 [pdf, ps, other]

A Differentiable Model for Optimizing the Genetic Drivers of Synaptogenesis

Authors: Tommaso Boccato, Matteo Ferrante, Nicola Toschi

Abstract: There is growing consensus among neuroscientists that neural circuits critical for survival are the result of genomic decompression processes. We introduce SynaptoGen, a novel computational framework--member of the Connectome Models family--bringing synthetic biological intelligence closer, facilitating neural biological agent development through precise genetic control of synaptogenesis. SynaptoG… ▽ More There is growing consensus among neuroscientists that neural circuits critical for survival are the result of genomic decompression processes. We introduce SynaptoGen, a novel computational framework--member of the Connectome Models family--bringing synthetic biological intelligence closer, facilitating neural biological agent development through precise genetic control of synaptogenesis. SynaptoGen is the first model of its kind offering mechanistic explanation of synaptic multiplicity based on genetic expression and protein interaction probabilities. The framework connects genetic factors through a differentiable function, working as a neural network where synaptic weights equal average numbers of synapses between neurons, multiplied by conductance, derived from genetic profiles. Differentiability enables gradient-based optimization, allowing generation of genetic expression patterns producing pre-wired biological agents for specific tasks. Validation in simulated synaptogenesis scenarios shows agents successfully solving four reinforcement learning benchmarks, consistently surpassing control baselines. Despite gaps in biological realism requiring mitigation, this framework has potential to accelerate synthetic biological intelligence research. △ Less

Submitted 7 September, 2025; v1 submitted 11 February, 2024; originally announced February 2024.

arXiv:2309.07149 [pdf, other]

Decoding visual brain representations from electroencephalography through Knowledge Distillation and latent diffusion models

Authors: Matteo Ferrante, Tommaso Boccato, Stefano Bargione, Nicola Toschi

Abstract: Decoding visual representations from human brain activity has emerged as a thriving research domain, particularly in the context of brain-computer interfaces. Our study presents an innovative method that employs to classify and reconstruct images from the ImageNet dataset using electroencephalography (EEG) data from subjects that had viewed the images themselves (i.e. "brain decoding"). We analyze… ▽ More Decoding visual representations from human brain activity has emerged as a thriving research domain, particularly in the context of brain-computer interfaces. Our study presents an innovative method that employs to classify and reconstruct images from the ImageNet dataset using electroencephalography (EEG) data from subjects that had viewed the images themselves (i.e. "brain decoding"). We analyzed EEG recordings from 6 participants, each exposed to 50 images spanning 40 unique semantic categories. These EEG readings were converted into spectrograms, which were then used to train a convolutional neural network (CNN), integrated with a knowledge distillation procedure based on a pre-trained Contrastive Language-Image Pre-Training (CLIP)-based image classification teacher network. This strategy allowed our model to attain a top-5 accuracy of 80%, significantly outperforming a standard CNN and various RNN-based benchmarks. Additionally, we incorporated an image reconstruction mechanism based on pre-trained latent diffusion models, which allowed us to generate an estimate of the images which had elicited EEG activity. Therefore, our architecture not only decodes images from neural activity but also offers a credible image reconstruction from EEG only, paving the way for e.g. swift, individualized feedback experiments. Our research represents a significant step forward in connecting neural signals with visual cognition. △ Less

Submitted 8 September, 2023; originally announced September 2023.

arXiv:2309.00627 [pdf, other]

Through their eyes: multi-subject Brain Decoding with simple alignment techniques

Authors: Matteo Ferrante, Tommaso Boccato, Nicola Toschi

Abstract: Previous brain decoding research primarily involves single-subject studies, reconstructing stimuli via fMRI activity from the same subject. Our study aims to introduce a generalization technique for cross-subject brain decoding, facilitated by exploring data alignment methods. We utilized the NSD dataset, a comprehensive 7T fMRI vision experiment involving multiple subjects exposed to 9841 images,… ▽ More Previous brain decoding research primarily involves single-subject studies, reconstructing stimuli via fMRI activity from the same subject. Our study aims to introduce a generalization technique for cross-subject brain decoding, facilitated by exploring data alignment methods. We utilized the NSD dataset, a comprehensive 7T fMRI vision experiment involving multiple subjects exposed to 9841 images, 982 of which were viewed by all. Our approach involved training a decoding model on one subject, aligning others' data to this space, and testing the decoding on the second subject. We compared ridge regression, hyper alignment, and anatomical alignment techniques for fMRI data alignment. We established that cross-subject brain decoding is feasible, even using around 10% of the total data, or 982 common images, with comparable performance to single-subject decoding. Ridge regression was the best method for functional alignment. Through subject alignment, we achieved superior brain decoding and a potential 90% reduction in scan time. This could pave the way for more efficient experiments and further advancements in the field, typically requiring an exorbitant 20-hour scan time per subject. △ Less

Submitted 1 August, 2023; originally announced September 2023.

arXiv:2212.06726 [pdf, other]

Semantic Brain Decoding: from fMRI to conceptually similar image reconstruction of visual stimuli

Authors: Matteo Ferrante, Tommaso Boccato, Nicola Toschi

Abstract: Brain decoding is a field of computational neuroscience that uses measurable brain activity to infer mental states or internal representations of perceptual inputs. Therefore, we propose a novel approach to brain decoding that also relies on semantic and contextual similarity. We employ an fMRI dataset of natural image vision and create a deep learning decoding pipeline inspired by the existence o… ▽ More Brain decoding is a field of computational neuroscience that uses measurable brain activity to infer mental states or internal representations of perceptual inputs. Therefore, we propose a novel approach to brain decoding that also relies on semantic and contextual similarity. We employ an fMRI dataset of natural image vision and create a deep learning decoding pipeline inspired by the existence of both bottom-up and top-down processes in human vision. We train a linear brain-to-feature model to map fMRI activity features to visual stimuli features, assuming that the brain projects visual information onto a space that is homeomorphic to the latent space represented by the last convolutional layer of a pretrained convolutional neural network, which typically collects a variety of semantic features that summarize and highlight similarities and differences between concepts. These features are then categorized in the latent space using a nearest-neighbor strategy, and the results are used to condition a generative latent diffusion model to create novel images. From fMRI data only, we produce reconstructions of visual stimuli that match the original content very well on a semantic level, surpassing the state of the art in previous literature. We evaluate our work and obtain good results using a quantitative semantic metric (the Wu-Palmer similarity metric over the WordNet lexicon, which had an average value of 0.57) and perform a human evaluation experiment that resulted in correct evaluation, according to the multiplicity of human criteria in evaluating image similarity, in over 80% of the test set. △ Less

Submitted 22 March, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

arXiv:2003.03290 [pdf, other]

Towards a predictive spatio-temporal representation of brain data

Authors: Tiago Azevedo, Luca Passamonti, Pietro Liò, Nicola Toschi

Abstract: The characterisation of the brain as a "connectome", in which the connections are represented by correlational values across timeseries and as summary measures derived from graph theory analyses, has been very popular in the last years. However, although this representation has advanced our understanding of the brain function, it may represent an oversimplified model. This is because the typical f… ▽ More The characterisation of the brain as a "connectome", in which the connections are represented by correlational values across timeseries and as summary measures derived from graph theory analyses, has been very popular in the last years. However, although this representation has advanced our understanding of the brain function, it may represent an oversimplified model. This is because the typical fMRI datasets are constituted by complex and highly heterogeneous timeseries that vary across space (i.e., location of brain regions). We compare various modelling techniques from deep learning and geometric deep learning to pave the way for future research in effectively leveraging the rich spatial and temporal domains of typical fMRI datasets, as well as of other similar datasets. As a proof-of-concept, we compare our approaches in the homogeneous and publicly available Human Connectome Project (HCP) dataset on a supervised binary classification task. We hope that our methodological advances relative to previous "connectomic" measures can ultimately be clinically and computationally relevant by leading to a more nuanced understanding of the brain dynamics in health and disease. Such understanding of the brain can fundamentally reduce the constant specialised clinical expertise in order to accurately understand brain variability. △ Less

Submitted 29 February, 2020; originally announced March 2020.

Comments: To appear in the Workshop on AI for Affordable Healthcare (AI4AH) at ICLR 2020. 8 pages, 2 figures

Showing 1–7 of 7 results for author: Toschi, N