-
UniMind: Unleashing the Power of LLMs for Unified Multi-Task Brain Decoding
Authors:
Weiheng Lu,
Chunfeng Song,
Jiamin Wu,
Pengyu Zhu,
Yuchen Zhou,
Weijian Mai,
Qihao Zheng,
Wanli Ouyang
Abstract:
Decoding human brain activity from electroencephalography (EEG) signals is a central challenge at the intersection of neuroscience and artificial intelligence, enabling diverse applications in mental state assessment, clinical monitoring, and human-machine interaction. Recent efforts have extensively explored EEG-based brain foundation models for generalized brain decoding, employing large-scale t…
▽ More
Decoding human brain activity from electroencephalography (EEG) signals is a central challenge at the intersection of neuroscience and artificial intelligence, enabling diverse applications in mental state assessment, clinical monitoring, and human-machine interaction. Recent efforts have extensively explored EEG-based brain foundation models for generalized brain decoding, employing large-scale training on multiple datasets. However, most of these attempts struggle with generalizability and fail to achieve satisfactory performance without task-specific tuning due to pronounced inherent heterogeneity among decoding tasks. To address these challenges, we present UniMind, a general-purpose EEG foundation model for unified multi-task brain decoding by uniquely unleashing the power of large language models to comprehend complex neural patterns. UniMind offers several advantages. First, we design a Neuro-Language Connector to bridge the modality gap between neural signals and large language models, distilling and transforming the spatiotemporal neural patterns of EEG data into representations understandable by language models. Second, a Task-aware Query Selection module is proposed to inject task-awareness into the cross-modal alignment by dynamically generating task-adaptive query tokens, enabling learning of task-relevant neural patterns across diverse tasks. Extensive experiments across ten datasets demonstrate that UniMind substantially outperforms state-of-the-art multi-task decoding models, with an average gain of 12 percent, while also offering valuable neuroscientific insights into neural functional correlations across tasks. The code will be made publicly available.
△ Less
Submitted 23 June, 2025;
originally announced June 2025.
-
Learning Personalized Utility Functions for Drivers in Ride-hailing Systems Using Ensemble Hypernetworks
Authors:
Weiming Mai,
Jie Gao,
Oded Cats
Abstract:
In ride-hailing systems, drivers decide whether to accept or reject ride requests based on factors such as order characteristics, traffic conditions, and personal preferences. Accurately predicting these decisions is essential for improving the efficiency and reliability of these systems. Traditional models, such as the Random Utility Maximization (RUM) approach, typically predict drivers' decisio…
▽ More
In ride-hailing systems, drivers decide whether to accept or reject ride requests based on factors such as order characteristics, traffic conditions, and personal preferences. Accurately predicting these decisions is essential for improving the efficiency and reliability of these systems. Traditional models, such as the Random Utility Maximization (RUM) approach, typically predict drivers' decisions by assuming linear correlations among attributes. However, these models often fall short because they fail to account for non-linear interactions between attributes and do not cater to the unique, personalized preferences of individual drivers. In this paper, we develop a method for learning personalized utility functions using hypernetwork and ensemble learning. Hypernetworks dynamically generate weights for a linear utility function based on trip request data and driver profiles, capturing the non-linear relationships. An ensemble of hypernetworks trained on different data segments further improve model adaptability and generalization by introducing controlled randomness, thereby reducing over-fitting. We validate the performance of our ensemble hypernetworks model in terms of prediction accuracy and uncertainty estimation in a real-world dataset. The results demonstrate that our approach not only accurately predicts each driver's utility but also effectively balances the needs for explainability and uncertainty quantification. Additionally, our model serves as a powerful tool for revealing the personalized preferences of different drivers, clearly illustrating which attributes largely impact their rider acceptance decisions.
△ Less
Submitted 21 June, 2025;
originally announced June 2025.
-
MindAligner: Explicit Brain Functional Alignment for Cross-Subject Visual Decoding from Limited fMRI Data
Authors:
Yuqin Dai,
Zhouheng Yao,
Chunfeng Song,
Qihao Zheng,
Weijian Mai,
Kunyu Peng,
Shuai Lu,
Wanli Ouyang,
Jian Yang,
Jiamin Wu
Abstract:
Brain decoding aims to reconstruct visual perception of human subject from fMRI signals, which is crucial for understanding brain's perception mechanisms. Existing methods are confined to the single-subject paradigm due to substantial brain variability, which leads to weak generalization across individuals and incurs high training costs, exacerbated by limited availability of fMRI data. To address…
▽ More
Brain decoding aims to reconstruct visual perception of human subject from fMRI signals, which is crucial for understanding brain's perception mechanisms. Existing methods are confined to the single-subject paradigm due to substantial brain variability, which leads to weak generalization across individuals and incurs high training costs, exacerbated by limited availability of fMRI data. To address these challenges, we propose MindAligner, an explicit functional alignment framework for cross-subject brain decoding from limited fMRI data. The proposed MindAligner enjoys several merits. First, we learn a Brain Transfer Matrix (BTM) that projects the brain signals of an arbitrary new subject to one of the known subjects, enabling seamless use of pre-trained decoding models. Second, to facilitate reliable BTM learning, a Brain Functional Alignment module is proposed to perform soft cross-subject brain alignment under different visual stimuli with a multi-level brain alignment loss, uncovering fine-grained functional correspondences with high interpretability. Experiments indicate that MindAligner not only outperforms existing methods in visual decoding under data-limited conditions, but also provides valuable neuroscience insights in cross-subject functional analysis. The code will be made publicly available.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense
Authors:
Wuyuao Mai,
Geng Hong,
Pei Chen,
Xudong Pan,
Baojun Liu,
Yuan Zhang,
Haixin Duan,
Min Yang
Abstract:
With the rise of generative large language models (LLMs) like LLaMA and ChatGPT, these models have significantly transformed daily life and work by providing advanced insights. However, as jailbreak attacks continue to circumvent built-in safety mechanisms, exploiting carefully crafted scenarios or tokens, the safety risks of LLMs have come into focus. While numerous defense strategies--such as pr…
▽ More
With the rise of generative large language models (LLMs) like LLaMA and ChatGPT, these models have significantly transformed daily life and work by providing advanced insights. However, as jailbreak attacks continue to circumvent built-in safety mechanisms, exploiting carefully crafted scenarios or tokens, the safety risks of LLMs have come into focus. While numerous defense strategies--such as prompt detection, modification, and model fine-tuning--have been proposed to counter these attacks, a critical question arises: do these defenses compromise the utility and usability of LLMs for legitimate users? Existing research predominantly focuses on the effectiveness of defense strategies without thoroughly examining their impact on performance, leaving a gap in understanding the trade-offs between LLM safety and performance. Our research addresses this gap by conducting a comprehensive study on the utility degradation, safety elevation, and exaggerated-safety escalation of LLMs with jailbreak defense strategies. We propose USEBench, a novel benchmark designed to evaluate these aspects, along with USEIndex, a comprehensive metric for assessing overall model performance. Through experiments on seven state-of-the-art LLMs, we found that mainstream jailbreak defenses fail to ensure both safety and performance simultaneously. Although model-finetuning performs the best overall, their effectiveness varies across LLMs. Furthermore, vertical comparisons reveal that developers commonly prioritize performance over safety when iterating or fine-tuning their LLMs.
△ Less
Submitted 21 January, 2025;
originally announced January 2025.
-
Neuro-3D: Towards 3D Visual Decoding from EEG Signals
Authors:
Zhanqiang Guo,
Jiamin Wu,
Yonghao Song,
Jiahui Bu,
Weijian Mai,
Qihao Zheng,
Wanli Ouyang,
Chunfeng Song
Abstract:
Human's perception of the visual world is shaped by the stereo processing of 3D information. Understanding how the brain perceives and processes 3D visual stimuli in the real world has been a longstanding endeavor in neuroscience. Towards this goal, we introduce a new neuroscience task: decoding 3D visual perception from EEG signals, a neuroimaging technique that enables real-time monitoring of ne…
▽ More
Human's perception of the visual world is shaped by the stereo processing of 3D information. Understanding how the brain perceives and processes 3D visual stimuli in the real world has been a longstanding endeavor in neuroscience. Towards this goal, we introduce a new neuroscience task: decoding 3D visual perception from EEG signals, a neuroimaging technique that enables real-time monitoring of neural dynamics enriched with complex visual cues. To provide the essential benchmark, we first present EEG-3D, a pioneering dataset featuring multimodal analysis data and extensive EEG recordings from 12 subjects viewing 72 categories of 3D objects rendered in both videos and images. Furthermore, we propose Neuro-3D, a 3D visual decoding framework based on EEG signals. This framework adaptively integrates EEG features derived from static and dynamic stimuli to learn complementary and robust neural representations, which are subsequently utilized to recover both the shape and color of 3D objects through the proposed diffusion-based colored point cloud decoder. To the best of our knowledge, we are the first to explore EEG-based 3D visual decoding. Experiments indicate that Neuro-3D not only reconstructs colored 3D objects with high fidelity, but also learns effective neural representations that enable insightful brain region analysis. The dataset and associated code will be made publicly available.
△ Less
Submitted 21 November, 2024; v1 submitted 19 November, 2024;
originally announced November 2024.
-
EMOdiffhead: Continuously Emotional Control in Talking Head Generation via Diffusion
Authors:
Jian Zhang,
Weijian Mai,
Zhijun Zhang
Abstract:
The task of audio-driven portrait animation involves generating a talking head video using an identity image and an audio track of speech. While many existing approaches focus on lip synchronization and video quality, few tackle the challenge of generating emotion-driven talking head videos. The ability to control and edit emotions is essential for producing expressive and realistic animations. In…
▽ More
The task of audio-driven portrait animation involves generating a talking head video using an identity image and an audio track of speech. While many existing approaches focus on lip synchronization and video quality, few tackle the challenge of generating emotion-driven talking head videos. The ability to control and edit emotions is essential for producing expressive and realistic animations. In response to this challenge, we propose EMOdiffhead, a novel method for emotional talking head video generation that not only enables fine-grained control of emotion categories and intensities but also enables one-shot generation. Given the FLAME 3D model's linearity in expression modeling, we utilize the DECA method to extract expression vectors, that are combined with audio to guide a diffusion model in generating videos with precise lip synchronization and rich emotional expressiveness. This approach not only enables the learning of rich facial information from emotion-irrelevant data but also facilitates the generation of emotional videos. It effectively overcomes the limitations of emotional data, such as the lack of diversity in facial and background information, and addresses the absence of emotional details in emotion-irrelevant data. Extensive experiments and user studies demonstrate that our approach achieves state-of-the-art performance compared to other emotion portrait animation methods.
△ Less
Submitted 11 September, 2024;
originally announced September 2024.
-
Brain-Conditional Multimodal Synthesis: A Survey and Taxonomy
Authors:
Weijian Mai,
Jian Zhang,
Pengfei Fang,
Zhijun Zhang
Abstract:
In the era of Artificial Intelligence Generated Content (AIGC), conditional multimodal synthesis technologies (e.g., text-to-image, text-to-video, text-to-audio, etc) are gradually reshaping the natural content in the real world. The key to multimodal synthesis technology is to establish the mapping relationship between different modalities. Brain signals, serving as potential reflections of how t…
▽ More
In the era of Artificial Intelligence Generated Content (AIGC), conditional multimodal synthesis technologies (e.g., text-to-image, text-to-video, text-to-audio, etc) are gradually reshaping the natural content in the real world. The key to multimodal synthesis technology is to establish the mapping relationship between different modalities. Brain signals, serving as potential reflections of how the brain interprets external information, exhibit a distinctive One-to-Many correspondence with various external modalities. This correspondence makes brain signals emerge as a promising guiding condition for multimodal content synthesis. Brian-conditional multimodal synthesis refers to decoding brain signals back to perceptual experience, which is crucial for developing practical brain-computer interface systems and unraveling complex mechanisms underlying how the brain perceives and comprehends external stimuli. This survey comprehensively examines the emerging field of AIGC-based Brain-conditional Multimodal Synthesis, termed AIGC-Brain, to delineate the current landscape and future directions. To begin, related brain neuroimaging datasets, functional brain regions, and mainstream generative models are introduced as the foundation of AIGC-Brain decoding and analysis. Next, we provide a comprehensive taxonomy for AIGC-Brain decoding models and present task-specific representative work and detailed implementation strategies to facilitate comparison and in-depth analysis. Quality assessments are then introduced for both qualitative and quantitative evaluation. Finally, this survey explores insights gained, providing current challenges and outlining prospects of AIGC-Brain. Being the inaugural survey in this domain, this paper paves the way for the progress of AIGC-Brain research, offering a foundational overview to guide future work.
△ Less
Submitted 3 January, 2024; v1 submitted 31 December, 2023;
originally announced January 2024.
-
UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity
Authors:
Weijian Mai,
Zhijun Zhang
Abstract:
Image reconstruction and captioning from brain activity evoked by visual stimuli allow researchers to further understand the connection between the human brain and the visual perception system. While deep generative models have recently been employed in this field, reconstructing realistic captions and images with both low-level details and high semantic fidelity is still a challenging problem. In…
▽ More
Image reconstruction and captioning from brain activity evoked by visual stimuli allow researchers to further understand the connection between the human brain and the visual perception system. While deep generative models have recently been employed in this field, reconstructing realistic captions and images with both low-level details and high semantic fidelity is still a challenging problem. In this work, we propose UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity. For the first time, we unify image reconstruction and captioning from visual-evoked functional magnetic resonance imaging (fMRI) through a latent diffusion model termed Versatile Diffusion. Specifically, we transform fMRI voxels into text and image latent for low-level information and guide the backward diffusion process through fMRI-based image and text conditions derived from CLIP to generate realistic captions and images. UniBrain outperforms current methods both qualitatively and quantitatively in terms of image reconstruction and reports image captioning results for the first time on the Natural Scenes Dataset (NSD) dataset. Moreover, the ablation experiments and functional region-of-interest (ROI) analysis further exhibit the superiority of UniBrain and provide comprehensive insight for visual-evoked brain decoding.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
Contact tracing Inspired Efficient Computation by Energy Tracing
Authors:
Wending Mai,
Ronald P. Jenkins,
Yifan Chen,
Douglas H. Werner
Abstract:
Inspired by the epidemic contact tracing technique, we propose a method to efficiently solve electromagnetics by tracing the energy distribution. The computational domain is adaptively decomposed, and the available computational resources are focused on those energy-active (infections) and their adjacent (exposed) domains, while avoiding the unnecessary computation of energy-null (unexposed) domai…
▽ More
Inspired by the epidemic contact tracing technique, we propose a method to efficiently solve electromagnetics by tracing the energy distribution. The computational domain is adaptively decomposed, and the available computational resources are focused on those energy-active (infections) and their adjacent (exposed) domains, while avoiding the unnecessary computation of energy-null (unexposed) domains. As an example, we employ this method to solve several optics problems. The proposed method shows high efficiency while maintaining a good accuracy. The energy tracing method is based on the causality principle, and therefore is potentially transformative into other computational physics and associated algorithms.
△ Less
Submitted 8 August, 2022; v1 submitted 9 July, 2022;
originally announced July 2022.
-
KinD-LCE Curve Estimation And Retinex Fusion On Low-Light Image
Authors:
Xiaochun Lei,
Weiliang Mai,
Junlin Xie,
He Liu,
Zetao Jiang,
Zhaoting Gong,
Chang Lu,
Linjun Lu
Abstract:
Low-light images often suffer from noise and color distortion. Object detection, semantic segmentation, instance segmentation, and other tasks are challenging when working with low-light images because of image noise and chromatic aberration. We also found that the conventional Retinex theory loses information in adjusting the image for low-light tasks. In response to the aforementioned problem, t…
▽ More
Low-light images often suffer from noise and color distortion. Object detection, semantic segmentation, instance segmentation, and other tasks are challenging when working with low-light images because of image noise and chromatic aberration. We also found that the conventional Retinex theory loses information in adjusting the image for low-light tasks. In response to the aforementioned problem, this paper proposes an algorithm for low illumination enhancement. The proposed method, KinD-LCE, uses a light curve estimation module to enhance the illumination map in the Retinex decomposed image, improving the overall image brightness. An illumination map and reflection map fusion module were also proposed to restore the image details and reduce detail loss. Additionally, a TV(total variation) loss function was applied to eliminate noise. Our method was trained on the GladNet dataset, known for its diverse collection of low-light images, tested against the Low-Light dataset, and evaluated using the ExDark dataset for downstream tasks, demonstrating competitive performance with a PSNR of 19.7216 and SSIM of 0.8213.
△ Less
Submitted 23 October, 2023; v1 submitted 19 July, 2022;
originally announced July 2022.