-
Exploring a Gamified Personality Assessment Method through Interaction with Multi-Personality LLM Agents
Authors:
Baiqiao Zhang,
Xiangxian Li,
Chao Zhou,
Xinyu Gai,
Zhifeng Liao,
Juan Liu,
Xue Yang,
Niqi Liu,
Xiaojuan Ma,
Yong-jin Liu,
Yulong Bian
Abstract:
The execution of effective and imperceptible personality assessments is receiving increasing attention in psychology and human-computer interaction fields. This study explores an interactive approach for personality assessment, focusing on the multiplicity of personality representation. We propose a framework of gamified personality assessment through multi-personality representations (Multi-PR GP…
▽ More
The execution of effective and imperceptible personality assessments is receiving increasing attention in psychology and human-computer interaction fields. This study explores an interactive approach for personality assessment, focusing on the multiplicity of personality representation. We propose a framework of gamified personality assessment through multi-personality representations (Multi-PR GPA). The framework leverages Large Language Models to empower virtual agents with diverse personalities. These agents elicit multifaceted human personality representations through engaging in interactive games. Drawing upon the multi-type textual data generated throughout the interaction, it achieves two ways of personality assessments (i.e., Direct Assessment and Que-based Assessment) and provides interpretable insights. Grounded in the classic Big Five theory, we implemented a prototype system and conducted a user study to assess the efficacy of Multi-PR GPA. The results underscore the effectiveness of our approach in personality assessment and demonstrate that it achieves superior performance when considering the multiplicity of personality representation.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
V2T-CoT: From Vision to Text Chain-of-Thought for Medical Reasoning and Diagnosis
Authors:
Yuan Wang,
Jiaxiang Liu,
Shujian Gao,
Bin Feng,
Zhihang Tang,
Xiaotang Gai,
Jian Wu,
Zuozhu Liu
Abstract:
Recent advances in multimodal techniques have led to significant progress in Medical Visual Question Answering (Med-VQA). However, most existing models focus on global image features rather than localizing disease-specific regions crucial for diagnosis. Additionally, current research tends to emphasize answer accuracy at the expense of the reasoning pathway, yet both are crucial for clinical decis…
▽ More
Recent advances in multimodal techniques have led to significant progress in Medical Visual Question Answering (Med-VQA). However, most existing models focus on global image features rather than localizing disease-specific regions crucial for diagnosis. Additionally, current research tends to emphasize answer accuracy at the expense of the reasoning pathway, yet both are crucial for clinical decision-making. To address these challenges, we propose From Vision to Text Chain-of-Thought (V2T-CoT), a novel approach that automates the localization of preference areas within biomedical images and incorporates this localization into region-level pixel attention as knowledge for Vision CoT. By fine-tuning the vision language model on constructed R-Med 39K dataset, V2T-CoT provides definitive medical reasoning paths. V2T-CoT integrates visual grounding with textual rationale generation to establish precise and explainable diagnostic results. Experimental results across four Med-VQA benchmarks demonstrate state-of-the-art performance, achieving substantial improvements in both performance and interpretability.
△ Less
Submitted 27 June, 2025; v1 submitted 24 June, 2025;
originally announced June 2025.
-
3D-RAD: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks
Authors:
Xiaotang Gai,
Jiaxiang Liu,
Yichen Li,
Zijie Meng,
Jian Wu,
Zuozhu Liu
Abstract:
Medical Visual Question Answering (Med-VQA) holds significant potential for clinical decision support, yet existing efforts primarily focus on 2D imaging with limited task diversity. This paper presents 3D-RAD, a large-scale dataset designed to advance 3D Med-VQA using radiology CT scans. The 3D-RAD dataset encompasses six diverse VQA tasks: anomaly detection, image observation, medical computatio…
▽ More
Medical Visual Question Answering (Med-VQA) holds significant potential for clinical decision support, yet existing efforts primarily focus on 2D imaging with limited task diversity. This paper presents 3D-RAD, a large-scale dataset designed to advance 3D Med-VQA using radiology CT scans. The 3D-RAD dataset encompasses six diverse VQA tasks: anomaly detection, image observation, medical computation, existence detection, static temporal diagnosis, and longitudinal temporal diagnosis. It supports both open- and closed-ended questions while introducing complex reasoning challenges, including computational tasks and multi-stage temporal analysis, to enable comprehensive benchmarking. Extensive evaluations demonstrate that existing vision-language models (VLMs), especially medical VLMs exhibit limited generalization, particularly in multi-temporal tasks, underscoring the challenges of real-world 3D diagnostic reasoning. To drive future advancements, we release a high-quality training set 3D-RAD-T of 136,195 expert-aligned samples, showing that fine-tuning on this dataset could significantly enhance model performance. Our dataset and code, aiming to catalyze multimodal medical AI research and establish a robust foundation for 3D medical visual understanding, are publicly available at https://github.com/Tang-xiaoxiao/M3D-RAD.
△ Less
Submitted 11 June, 2025;
originally announced June 2025.
-
FairSteer: Inference Time Debiasing for LLMs with Dynamic Activation Steering
Authors:
Yichen Li,
Zhiting Fan,
Ruizhe Chen,
Xiaotang Gai,
Luqi Gong,
Yan Zhang,
Zuozhu Liu
Abstract:
Large language models (LLMs) are prone to capturing biases from training corpus, leading to potential negative social impacts. Existing prompt-based debiasing methods exhibit instability due to their sensitivity to prompt changes, while fine-tuning-based techniques incur substantial computational overhead and catastrophic forgetting. In this paper, we propose FairSteer, a novel inference-time debi…
▽ More
Large language models (LLMs) are prone to capturing biases from training corpus, leading to potential negative social impacts. Existing prompt-based debiasing methods exhibit instability due to their sensitivity to prompt changes, while fine-tuning-based techniques incur substantial computational overhead and catastrophic forgetting. In this paper, we propose FairSteer, a novel inference-time debiasing framework without requiring customized prompt design or model retraining. Motivated by the linear representation hypothesis, our preliminary investigation demonstrates that fairness-related features can be encoded into separable directions in the hidden activation space. FairSteer operates in three steps: biased activation detection, debiasing steering vector (DSV) computation, and dynamic activation steering. Specifically, it first trains a lightweight linear classifier to detect bias signatures in activations, and then computes DSVs as intervention directions derived from small contrastive prompt pairs. Subsequently, it performs debiasing by adjusting activations with DSVs in the inference stage. Comprehensive evaluation with six LLMs demonstrates the superiority of FairSteer across question-answering, counterfactual input evaluation and open-ended text generation tasks. Code will be released.
△ Less
Submitted 5 July, 2025; v1 submitted 20 April, 2025;
originally announced April 2025.
-
Subtype-Aware Registration of Longitudinal Electronic Health Records
Authors:
Xin Gai,
Shiyi Jiang,
Anru R. Zhang
Abstract:
Electronic Health Records (EHRs) contain extensive patient information that can inform downstream clinical decisions, such as mortality prediction, disease phenotyping, and disease onset prediction. A key challenge in EHR data analysis is the temporal gap between when a condition is first recorded and its actual onset time. Such timeline misalignment can lead to artificially distinct biomarker tre…
▽ More
Electronic Health Records (EHRs) contain extensive patient information that can inform downstream clinical decisions, such as mortality prediction, disease phenotyping, and disease onset prediction. A key challenge in EHR data analysis is the temporal gap between when a condition is first recorded and its actual onset time. Such timeline misalignment can lead to artificially distinct biomarker trends among patients with similar disease progression, undermining the reliability of downstream analysis and complicating tasks like disease subtyping. To address this challenge, we provide a subtype-aware timeline registration method that leverages data projection and discrete optimization to simultaneously correct timeline misalignment and improve disease subtyping. Through simulation and real-world data analyses, we demonstrate that the proposed method effectively aligns distorted observed records with the true disease progression patterns, enhancing subtyping clarity and improving performance in downstream clinical analyses.
△ Less
Submitted 13 January, 2025;
originally announced January 2025.
-
BugBlitz-AI: An Intelligent QA Assistant
Authors:
Yi Yao,
Jun Wang,
Yabai Hu,
Lifeng Wang,
Yi Zhou,
Jack Chen,
Xuming Gai,
Zhenming Wang,
Wenjun Liu
Abstract:
The evolution of software testing from manual to automated methods has significantly influenced quality assurance (QA) practices. However, challenges persist in post-execution phases, particularly in result analysis and reporting. Traditional post-execution validation phases require manual intervention for result analysis and report generation, leading to inefficiencies and potential development c…
▽ More
The evolution of software testing from manual to automated methods has significantly influenced quality assurance (QA) practices. However, challenges persist in post-execution phases, particularly in result analysis and reporting. Traditional post-execution validation phases require manual intervention for result analysis and report generation, leading to inefficiencies and potential development cycle delays. This paper introduces BugBlitz-AI, an AI-powered validation toolkit designed to enhance end-to-end test automation by automating result analysis and bug reporting processes. BugBlitz-AI leverages recent advancements in artificial intelligence to reduce the time-intensive tasks of manual result analysis and report generation, allowing QA teams to focus more on crucial aspects of product quality. By adopting BugBlitz-AI, organizations can advance automated testing practices and integrate AI into QA processes, ensuring higher product quality and faster time-to-market. The paper outlines BugBlitz-AI's architecture, discusses related work, details its quality enhancement strategies, and presents results demonstrating its effectiveness in real-world scenarios.
△ Less
Submitted 17 May, 2024;
originally announced June 2024.
-
Functional Post-Clustering Selective Inference with Applications to EHR Data Analysis
Authors:
Zihan Zhu,
Xin Gai,
Anru R. Zhang
Abstract:
In electronic health records (EHR) analysis, clustering patients according to patterns in their data is crucial for uncovering new subtypes of diseases. Existing medical literature often relies on classical hypothesis testing methods to test for differences in means between these clusters. Due to selection bias induced by clustering algorithms, the implementation of these classical methods on post…
▽ More
In electronic health records (EHR) analysis, clustering patients according to patterns in their data is crucial for uncovering new subtypes of diseases. Existing medical literature often relies on classical hypothesis testing methods to test for differences in means between these clusters. Due to selection bias induced by clustering algorithms, the implementation of these classical methods on post-clustering data often leads to an inflated type-I error. In this paper, we introduce a new statistical approach that adjusts for this bias when analyzing data collected over time. Our method extends classical selective inference methods for cross-sectional data to longitudinal data. We provide theoretical guarantees for our approach with upper bounds on the selective type-I and type-II errors. We apply the method to simulated data and real-world Acute Kidney Injury (AKI) EHR datasets, thereby illustrating the advantages of our approach.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale
Authors:
Xiaotang Gai,
Chenyi Zhou,
Jiaxiang Liu,
Yang Feng,
Jian Wu,
Zuozhu Liu
Abstract:
Medical Visual Question Answering (MedVQA), which offers language responses to image-based medical inquiries, represents a challenging task and significant advancement in healthcare. It assists medical experts to swiftly interpret medical images, thereby enabling faster and more accurate diagnoses. However, the model interpretability and transparency of existing MedVQA solutions are often limited,…
▽ More
Medical Visual Question Answering (MedVQA), which offers language responses to image-based medical inquiries, represents a challenging task and significant advancement in healthcare. It assists medical experts to swiftly interpret medical images, thereby enabling faster and more accurate diagnoses. However, the model interpretability and transparency of existing MedVQA solutions are often limited, posing challenges in understanding their decision-making processes. To address this issue, we devise a semi-automated annotation process to streamline data preparation and build new benchmark MedVQA datasets R-RAD, R-SLAKE and R-Path. These datasets provide intermediate medical decision-making rationales generated by multimodal large language models and human annotations for question-answering pairs in existing MedVQA datasets, i.e., VQA-RAD, SLAKE and PathVQA. Moreover, we design a novel framework, MedThink, which finetunes lightweight pretrained generative models by incorporating medical decision-making rationales. MedThink includes three distinct strategies to generate decision outcomes and corresponding rationales, thereby clearly showcasing the medical decision-making process during reasoning. Our comprehensive experiments show that our method achieves an accuracy of 83.5% on R-RAD, 86.3% on R-SLAKE and 87.2% on R-Path. These results significantly exceed those of existing state-of-the-art models with comparable parameters. Datasets and code will be released.
△ Less
Submitted 7 October, 2024; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Soft Phenotyping for Sepsis via EHR Time-aware Soft Clustering
Authors:
Shiyi Jiang,
Xin Gai,
Miriam Treggiari,
William W. Stead,
Yuankang Zhao,
C. David Page,
Anru R. Zhang
Abstract:
Objective: Sepsis is one of the most serious hospital conditions associated with high mortality. Sepsis is the result of a dysregulated immune response to infection that can lead to multiple organ dysfunction and death. Due to the wide variability in the causes of sepsis, clinical presentation, and the recovery trajectories, identifying sepsis sub-phenotypes is crucial to advance our understanding…
▽ More
Objective: Sepsis is one of the most serious hospital conditions associated with high mortality. Sepsis is the result of a dysregulated immune response to infection that can lead to multiple organ dysfunction and death. Due to the wide variability in the causes of sepsis, clinical presentation, and the recovery trajectories, identifying sepsis sub-phenotypes is crucial to advance our understanding of sepsis characterization, to choose targeted treatments and optimal timing of interventions, and to improve prognostication. Prior studies have described different sub-phenotypes of sepsis using organ-specific characteristics. These studies applied clustering algorithms to electronic health records (EHRs) to identify disease sub-phenotypes. However, prior approaches did not capture temporal information and made uncertain assumptions about the relationships among the sub-phenotypes for clustering procedures.
Methods: We developed a time-aware soft clustering algorithm guided by clinical variables to identify sepsis sub-phenotypes using data available in the EHR.
Results: We identified six novel sepsis hybrid sub-phenotypes and evaluated them for medical plausibility. In addition, we built an early-warning sepsis prediction model using logistic regression.
Conclusion: Our results suggest that these novel sepsis hybrid sub-phenotypes are promising to provide more accurate information on sepsis-related organ dysfunction and sepsis recovery trajectories which can be important to inform management decisions and sepsis prognosis.
△ Less
Submitted 5 May, 2024; v1 submitted 14 November, 2023;
originally announced November 2023.
-
A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image Diagnosis
Authors:
Jiaxiang Liu,
Tianxiang Hu,
Yan Zhang,
Xiaotang Gai,
Yang Feng,
Zuozhu Liu
Abstract:
Zero-shot medical image classification is a critical process in real-world scenarios where we have limited access to all possible diseases or large-scale annotated data. It involves computing similarity scores between a query medical image and possible disease categories to determine the diagnostic result. Recent advances in pretrained vision-language models (VLMs) such as CLIP have shown great pe…
▽ More
Zero-shot medical image classification is a critical process in real-world scenarios where we have limited access to all possible diseases or large-scale annotated data. It involves computing similarity scores between a query medical image and possible disease categories to determine the diagnostic result. Recent advances in pretrained vision-language models (VLMs) such as CLIP have shown great performance for zero-shot natural image recognition and exhibit benefits in medical applications. However, an explainable zero-shot medical image recognition framework with promising performance is yet under development. In this paper, we propose a novel CLIP-based zero-shot medical image classification framework supplemented with ChatGPT for explainable diagnosis, mimicking the diagnostic process performed by human experts. The key idea is to query large language models (LLMs) with category names to automatically generate additional cues and knowledge, such as disease symptoms or descriptions other than a single category name, to help provide more accurate and explainable diagnosis in CLIP. We further design specific prompts to enhance the quality of generated texts by ChatGPT that describe visual medical features. Extensive results on one private dataset and four public datasets along with detailed analysis demonstrate the effectiveness and explainability of our training-free zero-shot diagnosis pipeline, corroborating the great potential of VLMs and LLMs for medical applications.
△ Less
Submitted 4 July, 2023;
originally announced July 2023.