Skip to main content

Showing 1–50 of 703 results for author: Tan, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.01841  [pdf, ps, other

    cs.LG cs.IT eess.SP math.OC

    Automatic Rank Determination for Low-Rank Adaptation via Submodular Function Maximization

    Authors: Yihang Gao, Vincent Y. F. Tan

    Abstract: In this paper, we propose SubLoRA, a rank determination method for Low-Rank Adaptation (LoRA) based on submodular function maximization. In contrast to prior approaches, such as AdaLoRA, that rely on first-order (linearized) approximations of the loss function, SubLoRA utilizes second-order information to capture the potentially complex loss landscape by incorporating the Hessian matrix. We show t… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  2. arXiv:2507.00469  [pdf, ps, other

    cs.CV cs.LG

    Bisecle: Binding and Separation in Continual Learning for Video Language Understanding

    Authors: Yue Tan, Xiaoqian Hu, Hao Xue, Celso De Melo, Flora D. Salim

    Abstract: Frontier vision-language models (VLMs) have made remarkable improvements in video understanding tasks. However, real-world videos typically exist as continuously evolving data streams (e.g., dynamic scenes captured by wearable glasses), necessitating models to continually adapt to shifting data distributions and novel scenarios. Considering the prohibitive computational costs of fine-tuning models… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: 23 pages, 12 figures, 10 tables

  3. arXiv:2506.21046  [pdf, ps, other

    cs.CV cs.CR

    Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features

    Authors: Shangbo Wu, Yu-an Tan, Ruinan Ma, Wencong Ma, Dehua Zhu, Yuanzhang Li

    Abstract: The ability of deep neural networks (DNNs) come from extracting and interpreting features from the data provided. By exploiting intermediate features in DNNs instead of relying on hard labels, we craft adversarial perturbation that generalize more effectively, boosting black-box transferability. These features ubiquitously come from supervised learning in previous work. Inspired by the exceptional… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 14 pages, 9 figures, to appear in ICCV 2025

  4. arXiv:2506.19257  [pdf, ps, other

    cs.CV cs.CL

    MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models

    Authors: Yinan Xia, Yilei Jiang, Yingshui Tan, Xiaoyong Zhu, Xiangyu Yue, Bo Zheng

    Abstract: Vision-Language Models (VLMs) have achieved remarkable progress in multimodal reasoning tasks through enhanced chain-of-thought capabilities. However, this advancement also introduces novel safety risks, as these models become increasingly vulnerable to harmful multimodal prompts that can trigger unethical or unsafe behaviors. Existing safety alignment approaches, primarily designed for unimodal l… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  5. arXiv:2506.18278  [pdf, ps, other

    math.OC cs.IT cs.LG

    Finite-Time Information-Theoretic Bounds in Queueing Control

    Authors: Yujie Liu, Vincent Y. F. Tan, Yunbei Xu

    Abstract: We establish the first finite-time information-theoretic lower bounds-and derive new policies that achieve them-for the total queue length in scheduling problems over stochastic processing networks with both adversarial and stochastic arrivals. Prior analyses of MaxWeight guarantee only stability and asymptotic optimality in heavy traffic; we prove that, at finite horizons, MaxWeight can incur str… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  6. arXiv:2506.18140  [pdf, ps, other

    cs.CV

    See-in-Pairs: Reference Image-Guided Comparative Vision-Language Models for Medical Diagnosis

    Authors: Ruinan Jin, Gexin Huang, Xinwei Shen, Qiong Zhang, Yan Shuo Tan, Xiaoxiao Li

    Abstract: Medical imaging diagnosis presents inherent challenges due to diseases that mimic normal anatomy and exhibit significant inter-patient variability. Clinicians routinely employ comparative reasoning-using reference images from healthy controls or previous patient examinations-to discern subtle yet diagnostically critical abnormalities. However, existing medical vision-language models (VLMs) focus p… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 25 pages, four figures

  7. arXiv:2506.15643  [pdf, ps, other

    stat.ML cs.LG

    Revisiting Randomization in Greedy Model Search

    Authors: Xin Chen, Jason M. Klusowski, Yan Shuo Tan, Chang Yu

    Abstract: Combining randomized estimators in an ensemble, such as via random forests, has become a fundamental technique in modern data science, but can be computationally expensive. Furthermore, the mechanism by which this improves predictive performance is poorly understood. We address these issues in the context of sparse linear regression by proposing and analyzing an ensemble of greedy forward selectio… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  8. arXiv:2506.15442  [pdf, ps, other

    cs.CV cs.AI

    Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material

    Authors: Team Hunyuan3D, Shuhui Yang, Mingxin Yang, Yifei Feng, Xin Huang, Sheng Zhang, Zebin He, Di Luo, Haolin Liu, Yunfei Zhao, Qingxiang Lin, Zeqiang Lai, Xianghui Yang, Huiwen Shi, Zibo Zhao, Bowen Zhang, Hongyu Yan, Lifu Wang, Sicong Liu, Jihong Zhang, Meng Chen, Liang Dong, Yiwen Jia, Yulin Cai, Jiaao Yu , et al. (28 additional authors not shown)

    Abstract: 3D AI-generated content (AIGC) is a passionate field that has significantly accelerated the creation of 3D models in gaming, film, and design. Despite the development of several groundbreaking models that have revolutionized 3D generation, the field remains largely accessible only to researchers, developers, and designers due to the complexities involved in collecting, processing, and training 3D… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: Github link: https://github.com/Tencent-Hunyuan/Hunyuan3D-2.1

  9. arXiv:2506.12441  [pdf, ps, other

    cs.CV cs.AI

    MS-UMamba: An Improved Vision Mamba Unet for Fetal Abdominal Medical Image Segmentation

    Authors: Caixu Xu, Junming Wei, Huizhen Chen, Pengchen Liang, Bocheng Liang, Ying Tan, Xintong Wei

    Abstract: Recently, Mamba-based methods have become popular in medical image segmentation due to their lightweight design and long-range dependency modeling capabilities. However, current segmentation methods frequently encounter challenges in fetal ultrasound images, such as enclosed anatomical structures, blurred boundaries, and small anatomical structures. To address the need for balancing local feature… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  10. arXiv:2506.11253  [pdf, ps, other

    cs.CV cs.LG

    Lifting Data-Tracing Machine Unlearning to Knowledge-Tracing for Foundation Models

    Authors: Yuwen Tan, Boqing Gong

    Abstract: Machine unlearning removes certain training data points and their influence on AI models (e.g., when a data owner revokes their decision to allow models to learn from the data). In this position paper, we propose to lift data-tracing machine unlearning to knowledge-tracing for foundation models (FMs). We support this position based on practical needs and insights from cognitive studies. Practicall… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: 21 pages, 3 figures

  11. arXiv:2506.09467  [pdf, ps, other

    cs.DB

    ArcNeural: A Multi-Modal Database for the Gen-AI Era

    Authors: Wu Min, Qiao Yuncong, Yu Tan, Chenghu Yang

    Abstract: ArcNeural introduces a novel multimodal database tailored for the demands of Generative AI and Large Language Models, enabling efficient management of diverse data types such as graphs, vectors, and documents. Its storage-compute separated architecture integrates graph technology, advanced vector indexing, and transaction processing to support real-time analytics and AI-driven applications. Key fe… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  12. arXiv:2506.08979  [pdf, other

    cs.CV cs.RO

    Rethinking Range-View LiDAR Segmentation in Adverse Weather

    Authors: Longyu Yang, Ping Hu, Lu Zhang, Jun Liu, Yap-Peng Tan, Heng Tao Shen, Xiaofeng Zhu

    Abstract: LiDAR segmentation has emerged as an important task to enrich multimedia experiences and analysis. Range-view-based methods have gained popularity due to their high computational efficiency and compatibility with real-time deployment. However, their generalized performance under adverse weather conditions remains underexplored, limiting their reliability in real-world environments. In this work, w… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  13. arXiv:2506.08534  [pdf, ps, other

    eess.IV cs.AI cs.CV

    DCD: A Semantic Segmentation Model for Fetal Ultrasound Four-Chamber View

    Authors: Donglian Li, Hui Guo, Minglang Chen, Huizhen Chen, Jialing Chen, Bocheng Liang, Pengchen Liang, Ying Tan

    Abstract: Accurate segmentation of anatomical structures in the apical four-chamber (A4C) view of fetal echocardiography is essential for early diagnosis and prenatal evaluation of congenital heart disease (CHD). However, precise segmentation remains challenging due to ultrasound artifacts, speckle noise, anatomical variability, and boundary ambiguity across different gestational stages. To reduce the workl… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  14. arXiv:2506.07971  [pdf, ps, other

    cs.CV

    CyberV: Cybernetics for Test-time Scaling in Video Understanding

    Authors: Jiahao Meng, Shuyang Sun, Yue Tan, Lu Qi, Yunhai Tong, Xiangtai Li, Longyin Wen

    Abstract: Current Multimodal Large Language Models (MLLMs) may struggle with understanding long or complex videos due to computational demands at test time, lack of robustness, and limited accuracy, primarily stemming from their feed-forward processing nature. These limitations could be more severe for models with fewer parameters. To address these limitations, we propose a novel framework inspired by cyber… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  15. arXiv:2506.07542  [pdf

    cs.CV cs.AI

    APTOS-2024 challenge report: Generation of synthetic 3D OCT images from fundus photographs

    Authors: Bowen Liu, Weiyi Zhang, Peranut Chotcomwongse, Xiaolan Chen, Ruoyu Chen, Pawin Pakaymaskul, Niracha Arjkongharn, Nattaporn Vongsa, Xuelian Cheng, Zongyuan Ge, Kun Huang, Xiaohui Li, Yiru Duan, Zhenbang Wang, BaoYe Xie, Qiang Chen, Huazhu Fu, Michael A. Mahr, Jiaqi Qu, Wangyiyang Chen, Shiye Wang, Yubo Tan, Yongjie Li, Mingguang He, Danli Shi , et al. (1 additional authors not shown)

    Abstract: Optical Coherence Tomography (OCT) provides high-resolution, 3D, and non-invasive visualization of retinal layers in vivo, serving as a critical tool for lesion localization and disease diagnosis. However, its widespread adoption is limited by equipment costs and the need for specialized operators. In comparison, 2D color fundus photography offers faster acquisition and greater accessibility with… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  16. arXiv:2506.07431  [pdf, ps, other

    cs.CV cs.AI

    FAMSeg: Fetal Femur and Cranial Ultrasound Segmentation Using Feature-Aware Attention and Mamba Enhancement

    Authors: Jie He, Minglang Chen, Minying Lu, Bocheng Liang, Junming Wei, Guiyan Peng, Jiaxi Chen, Ying Tan

    Abstract: Accurate ultrasound image segmentation is a prerequisite for precise biometrics and accurate assessment. Relying on manual delineation introduces significant errors and is time-consuming. However, existing segmentation models are designed based on objects in natural scenes, making them difficult to adapt to ultrasound objects with high noise and high similarity. This is particularly evident in sma… ▽ More

    Submitted 14 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

  17. arXiv:2506.06873  [pdf, ps, other

    cs.LG stat.ML

    Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning

    Authors: Armin Behnamnia, Gholamali Aminian, Alireza Aghaei, Chengchun Shi, Vincent Y. F. Tan, Hamid R. Rabiee

    Abstract: Off-policy learning and evaluation leverage logged bandit feedback datasets, which contain context, action, propensity score, and feedback for each data point. These scenarios face significant challenges due to high variance and poor performance with low-quality propensity scores and heavy-tailed reward distributions. We address these issues by introducing a novel estimator based on the log-sum-ex… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: Accepted as spotlight poster in ICML 2025

  18. arXiv:2506.06821  [pdf, ps, other

    cs.CL cs.AI cs.SE

    Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems

    Authors: Yuhan Cao, Zian Chen, Kun Quan, Ziliang Zhang, Yu Wang, Xiaoning Dong, Yeqi Feng, Guanzhong He, Jingcheng Huang, Jianhao Li, Yixuan Tan, Jiafu Tang, Yilin Tang, Junlei Wu, Qianyu Xiao, Can Zheng, Shouchen Zhou, Yuxiang Zhu, Yiming Huang, Tian Xie, Tianxing He

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, capable of tackling complex tasks during inference. However, the extent to which LLMs can be utilized for code checking or debugging through test case generation remains largely unexplored. We investigate this problem from the perspective of competition-level programming (CP) programs and propose TCGBench, a… ▽ More

    Submitted 10 June, 2025; v1 submitted 7 June, 2025; originally announced June 2025.

    Comments: 37 pages, 22 figures

  19. arXiv:2506.06122  [pdf, ps, other

    cs.LG cs.DC

    Reinforcement Learning Optimization for Large-Scale Learning: An Efficient and User-Friendly Scaling Library

    Authors: Weixun Wang, Shaopan Xiong, Gengru Chen, Wei Gao, Sheng Guo, Yancheng He, Ju Huang, Jiaheng Liu, Zhendong Li, Xiaoyang Li, Zichen Liu, Haizhou Zhao, Dakai An, Lunxi Cao, Qiyang Cao, Wanxi Deng, Feilei Du, Yiliang Gu, Jiahe Li, Xiang Li, Mingjie Liu, Yijia Luo, Zihe Liu, Yadao Wang, Pei Wang , et al. (16 additional authors not shown)

    Abstract: We introduce ROLL, an efficient, scalable, and user-friendly library designed for Reinforcement Learning Optimization for Large-scale Learning. ROLL caters to three primary user groups: tech pioneers aiming for cost-effective, fault-tolerant large-scale training, developers requiring flexible control over training workflows, and researchers seeking agile experimentation. ROLL is built upon several… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

    Comments: 16 pages

  20. arXiv:2506.03373  [pdf, ps, other

    cs.CV cs.AI

    A Foundation Model for Spatial Proteomics

    Authors: Muhammad Shaban, Yuzhou Chang, Huaying Qiu, Yao Yu Yeo, Andrew H. Song, Guillaume Jaume, Yuchen Wang, Luca L. Weishaupt, Tong Ding, Anurag Vaidya, Abdallah Lamane, Daniel Shao, Mohammed Zidane, Yunhao Bai, Paige McCallum, Shuli Luo, Wenrui Wu, Yang Wang, Precious Cramer, Chi Ngai Chan, Pierre Stephan, Johanna Schaffenrath, Jia Le Lee, Hendrik A. Michel, Caiwei Tian , et al. (35 additional authors not shown)

    Abstract: Foundation models have begun to transform image analysis by acting as pretrained generalist backbones that can be adapted to many tasks even when post-training data are limited, yet their impact on spatial proteomics, imaging that maps proteins at single-cell resolution, remains limited. Here, we introduce KRONOS, a foundation model built for spatial proteomics. KRONOS was trained in a self-superv… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  21. arXiv:2506.02386  [pdf, ps, other

    cs.LG cs.AI cs.IT

    Asymptotically Optimal Linear Best Feasible Arm Identification with Fixed Budget

    Authors: Jie Bian, Vincent Y. F. Tan

    Abstract: The challenge of identifying the best feasible arm within a fixed budget has attracted considerable interest in recent years. However, a notable gap remains in the literature: the exact exponential rate at which the error probability approaches zero has yet to be established, even in the relatively simple setting of $K$-armed bandits with Gaussian noise. In this paper, we address this gap by exami… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: Accepted to the Conference on Uncertainty in Artificial Intelligence (UAI) 2025

  22. arXiv:2506.01456  [pdf

    q-bio.GN cs.AI cs.LG q-bio.NC

    GenDMR: A dynamic multimodal role-swapping network for identifying risk gene phenotypes

    Authors: Lina Qin, Cheng Zhu, Chuqi Zhou, Yukun Huang, Jiayi Zhu, Ping Liang, Jinju Wang, Yixing Huang, Cheng Luo, Dezhong Yao, Ying Tan

    Abstract: Recent studies have shown that integrating multimodal data fusion techniques for imaging and genetic features is beneficial for the etiological analysis and predictive diagnosis of Alzheimer's disease (AD). However, there are several critical flaws in current deep learning methods. Firstly, there has been insufficient discussion and exploration regarding the selection and encoding of genetic infor… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

    Comments: 31 pages, 9 figures

  23. arXiv:2506.00842  [pdf, ps, other

    cs.CL cs.AI

    Toward Structured Knowledge Reasoning: Contrastive Retrieval-Augmented Generation on Experience

    Authors: Jiawei Gu, Ziting Xian, Yuanzhen Xie, Ye Liu, Enjie Liu, Ruichao Zhong, Mochi Gao, Yunzhi Tan, Bo Hu, Zang Li

    Abstract: Large language models (LLMs) achieve strong performance on plain text tasks but underperform on structured data like tables and databases. Potential challenges arise from their underexposure during pre-training and rigid text-to-structure transfer mechanisms. Unlike humans who seamlessly apply learned patterns across data modalities, LLMs struggle to infer implicit relationships embedded in tabula… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: ACL 2025 Findings

  24. arXiv:2505.24840  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG

    Vision LLMs Are Bad at Hierarchical Visual Understanding, and LLMs Are the Bottleneck

    Authors: Yuwen Tan, Yuan Qing, Boqing Gong

    Abstract: This paper reveals that many state-of-the-art large language models (LLMs) lack hierarchical knowledge about our visual world, unaware of even well-established biology taxonomies. This shortcoming makes LLMs a bottleneck for vision LLMs' hierarchical visual understanding (e.g., recognizing Anemone Fish but not Vertebrate). We arrive at these findings using about one million four-choice visual ques… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: 28 pages, 13 figures

  25. arXiv:2505.23793  [pdf, ps, other

    cs.CR cs.AI

    USB: A Comprehensive and Unified Safety Evaluation Benchmark for Multimodal Large Language Models

    Authors: Baolin Zheng, Guanlin Chen, Hongqiong Zhong, Qingyang Teng, Yingshui Tan, Zhendong Liu, Weixun Wang, Jiaheng Liu, Jian Yang, Huiyun Jing, Jincheng Wei, Wenbo Su, Xiaoyong Zhu, Bo Zheng, Kaifu Zhang

    Abstract: Despite their remarkable achievements and widespread adoption, Multimodal Large Language Models (MLLMs) have revealed significant security vulnerabilities, highlighting the urgent need for robust safety evaluation benchmarks. Existing MLLM safety benchmarks, however, fall short in terms of data quality and coverge, and modal risk combinations, resulting in inflated and contradictory evaluation res… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  26. arXiv:2505.23352  [pdf, other

    cs.MA cs.AI

    Understanding the Information Propagation Effects of Communication Topologies in LLM-based Multi-Agent Systems

    Authors: Xu Shen, Yixin Liu, Yiwei Dai, Yili Wang, Rui Miao, Yue Tan, Shirui Pan, Xin Wang

    Abstract: The communication topology in large language model-based multi-agent systems fundamentally governs inter-agent collaboration patterns, critically shaping both the efficiency and effectiveness of collective decision-making. While recent studies for communication topology automated design tend to construct sparse structures for efficiency, they often overlook why and when sparse and dense topologies… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  27. arXiv:2505.23165  [pdf, ps, other

    cs.LG cs.AI cs.IT

    Best Arm Identification with Possibly Biased Offline Data

    Authors: Le Yang, Vincent Y. F. Tan, Wang Chi Cheung

    Abstract: We study the best arm identification (BAI) problem with potentially biased offline data in the fixed confidence setting, which commonly arises in real-world scenarios such as clinical trials. We prove an impossibility result for adaptive algorithms without prior knowledge of the bias bound between online and offline distributions. To address this, we propose the LUCB-H algorithm, which introduces… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: Accepted to UAI 2025

  28. arXiv:2505.20301  [pdf, ps, other

    q-bio.QM cs.LG

    Sequence-Only Prediction of Binding Affinity Changes: A Robust and Interpretable Model for Antibody Engineering

    Authors: Chen Liu, Mingchen Li, Yang Tan, Wenrui Gou, Guisheng Fan, Bingxin Zhou

    Abstract: A pivotal area of research in antibody engineering is to find effective modifications that enhance antibody-antigen binding affinity. Traditional wet-lab experiments assess mutants in a costly and time-consuming manner. Emerging deep learning solutions offer an alternative by modeling antibody structures to predict binding affinity changes. However, they heavily depend on high-quality complex stru… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  29. arXiv:2505.20003  [pdf, ps, other

    cs.LG stat.ME stat.ML

    TabPFN: One Model to Rule Them All?

    Authors: Qiong Zhang, Yan Shuo Tan, Qinglong Tian, Pengfei Li

    Abstract: Hollmann et al. (Nature 637 (2025) 319-326) recently introduced TabPFN, a transformer-based deep learning model for regression and classification on tabular data, which they claim "outperforms all previous methods on datasets with up to 10,000 samples by a wide margin, using substantially less training time." Furthermore, they have called TabPFN a "foundation model" for tabular data, as it can sup… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  30. arXiv:2505.19690  [pdf, ps, other

    cs.AI

    Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models

    Authors: Baihui Zheng, Boren Zheng, Kerui Cao, Yingshui Tan, Zhendong Liu, Weixun Wang, Jiaheng Liu, Jian Yang, Wenbo Su, Xiaoyong Zhu, Bo Zheng, Kaifu Zhang

    Abstract: Despite the remarkable proficiency of \textit{Large Reasoning Models} (LRMs) in handling complex reasoning tasks, their reliability in safety-critical scenarios remains uncertain. Existing evaluations primarily assess response-level safety, neglecting a critical issue we identify as \textbf{\textit{Superficial Safety Alignment} (SSA)} -- a phenomenon where models produce superficially safe outputs… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  31. arXiv:2505.18174  [pdf, ps, other

    eess.SP cs.AI cs.LG

    NMCSE: Noise-Robust Multi-Modal Coupling Signal Estimation Method via Optimal Transport for Cardiovascular Disease Detection

    Authors: Peihong Zhang, Zhixin Li, Rui Sang, Yuxuan Liu, Yiqiang Cai, Yizhou Tan, Shengchen Li

    Abstract: Electrocardiogram (ECG) and Phonocardiogram (PCG) signals are linked by a latent coupling signal representing the electrical-to-mechanical cardiac transformation. While valuable for cardiovascular disease (CVD) detection, this coupling signal is traditionally estimated using deconvolution methods that amplify noise, limiting clinical utility. In this paper, we propose Noise-Robust Multi-Modal Coup… ▽ More

    Submitted 2 June, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

  32. arXiv:2505.17095  [pdf, ps, other

    cs.CL

    Are LLMs reliable? An exploration of the reliability of large language models in clinical note generation

    Authors: Kristine Ann M. Carandang, Jasper Meynard P. Araña, Ethan Robert A. Casin, Christopher P. Monterola, Daniel Stanley Y. Tan, Jesus Felix B. Valenzuela, Christian M. Alis

    Abstract: Due to the legal and ethical responsibilities of healthcare providers (HCPs) for accurate documentation and protection of patient data privacy, the natural variability in the responses of large language models (LLMs) presents challenges for incorporating clinical note generation (CNG) systems, driven by LLMs, into real-world clinical processes. The complexity is further amplified by the detailed n… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  33. arXiv:2505.15659  [pdf, ps, other

    cs.RO cs.LG

    FLARE: Robot Learning with Implicit World Modeling

    Authors: Ruijie Zheng, Jing Wang, Scott Reed, Johan Bjorck, Yu Fang, Fengyuan Hu, Joel Jang, Kaushil Kundalia, Zongyu Lin, Loic Magne, Avnish Narayan, You Liang Tan, Guanzhi Wang, Qi Wang, Jiannan Xiang, Yinzhen Xu, Seonghyeon Ye, Jan Kautz, Furong Huang, Yuke Zhu, Linxi Fan

    Abstract: We introduce $\textbf{F}$uture $\textbf{LA}$tent $\textbf{RE}$presentation Alignment ($\textbf{FLARE}$), a novel framework that integrates predictive latent world modeling into robot policy learning. By aligning features from a diffusion transformer with latent embeddings of future observations, $\textbf{FLARE}$ enables a diffusion transformer policy to anticipate latent representations of future… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: Project Webpage / Blogpost: https://research.nvidia.com/labs/gear/flare

  34. arXiv:2505.15141  [pdf, ps, other

    cs.LG cs.AI stat.ML

    BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms

    Authors: Yunlong Hou, Fengzhuo Zhang, Cunxiao Du, Xuan Zhang, Jiachun Pan, Tianyu Pang, Chao Du, Vincent Y. F. Tan, Zhuoran Yang

    Abstract: Speculative decoding has emerged as a popular method to accelerate the inference of Large Language Models (LLMs) while retaining their superior text generation performance. Previous methods either adopt a fixed speculative decoding configuration regardless of the prefix tokens, or train draft models in an offline or online manner to align them with the context. This paper proposes a training-free… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

    Comments: 35 pages, 4 figures

  35. arXiv:2505.14552  [pdf, other

    cs.CL cs.AI cs.LG

    KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation

    Authors: Jiajun Shi, Jian Yang, Jiaheng Liu, Xingyuan Bu, Jiangjie Chen, Junting Zhou, Kaijing Ma, Zhoufutu Wen, Bingli Wang, Yancheng He, Liang Song, Hualei Zhu, Shilong Li, Xingjian Wang, Wei Zhang, Ruibin Yuan, Yifan Yao, Wenjun Yang, Yunli Wang, Siyuan Fang, Siyu Yuan, Qianyu He, Xiangru Tang, Yingshui Tan, Wangchunshu Zhou , et al. (4 additional authors not shown)

    Abstract: Recent advancements in large language models (LLMs) underscore the need for more comprehensive evaluation methods to accurately assess their reasoning capabilities. Existing benchmarks are often domain-specific and thus cannot fully capture an LLM's general reasoning potential. To address this limitation, we introduce the Knowledge Orthogonal Reasoning Gymnasium (KORGym), a dynamic evaluation plat… ▽ More

    Submitted 21 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

    Comments: 22 pages

  36. arXiv:2505.14436  [pdf, other

    cs.CL cs.AI

    Neural Incompatibility: The Unbridgeable Gap of Cross-Scale Parametric Knowledge Transfer in Large Language Models

    Authors: Yuqiao Tan, Shizhu He, Kang Liu, Jun Zhao

    Abstract: Large Language Models (LLMs) offer a transparent brain with accessible parameters that encode extensive knowledge, which can be analyzed, located and transferred. Consequently, a key research challenge is to transcend traditional knowledge transfer paradigms rooted in symbolic language and achieve genuine Parametric Knowledge Transfer (PKT). Significantly, exploring effective methods for transferr… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: Accepted by ACL'25 Main. Code link: https://github.com/Trae1ounG/Neural_Incompatibility

  37. arXiv:2505.12814  [pdf, ps, other

    cs.CL cs.AI

    PsyMem: Fine-grained psychological alignment and Explicit Memory Control for Advanced Role-Playing LLMs

    Authors: Xilong Cheng, Yunxiao Qin, Yuting Tan, Zhengnan Li, Ye Wang, Hongjiang Xiao, Yuan Zhang

    Abstract: Existing LLM-based role-playing methods often rely on superficial textual descriptions or simplistic metrics, inadequately modeling both intrinsic and extrinsic character dimensions. Additionally, they typically simulate character memory with implicit model knowledge or basic retrieval augment generation without explicit memory alignment, compromising memory consistency. The two issues weaken reli… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  38. arXiv:2505.12705  [pdf, ps, other

    cs.RO cs.AI cs.LG

    DreamGen: Unlocking Generalization in Robot Learning through Video World Models

    Authors: Joel Jang, Seonghyeon Ye, Zongyu Lin, Jiannan Xiang, Johan Bjorck, Yu Fang, Fengyuan Hu, Spencer Huang, Kaushil Kundalia, Yen-Chen Lin, Loic Magne, Ajay Mandlekar, Avnish Narayan, You Liang Tan, Guanzhi Wang, Jing Wang, Qi Wang, Yinzhen Xu, Xiaohui Zeng, Kaiyuan Zheng, Ruijie Zheng, Ming-Yu Liu, Luke Zettlemoyer, Dieter Fox, Jan Kautz , et al. (3 additional authors not shown)

    Abstract: We introduce DreamGen, a simple yet highly effective 4-stage pipeline for training robot policies that generalize across behaviors and environments through neural trajectories - synthetic robot data generated from video world models. DreamGen leverages state-of-the-art image-to-video generative models, adapting them to the target robot embodiment to produce photorealistic synthetic videos of famil… ▽ More

    Submitted 17 June, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

    Comments: See website for videos: https://research.nvidia.com/labs/gear/dreamgen

  39. arXiv:2505.11812  [pdf, ps, other

    cs.LG cs.CL q-bio.QM

    VenusX: Unlocking Fine-Grained Functional Understanding of Proteins

    Authors: Yang Tan, Wenrui Gou, Bozitao Zhong, Liang Hong, Huiqun Yu, Bingxin Zhou

    Abstract: Deep learning models have driven significant progress in predicting protein function and interactions at the protein level. While these advancements have been invaluable for many biological applications such as enzyme engineering and function annotation, a more detailed perspective is essential for understanding protein functional mechanisms and evaluating the biological knowledge captured by mode… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: 29 pages, 3 figures, 17 tables

  40. arXiv:2505.10996  [pdf, other

    cs.CV

    Visual Anomaly Detection under Complex View-Illumination Interplay: A Large-Scale Benchmark

    Authors: Yunkang Cao, Yuqi Cheng, Xiaohao Xu, Yiheng Zhang, Yihan Sun, Yuxiang Tan, Yuxin Zhang, Xiaonan Huang, Weiming Shen

    Abstract: The practical deployment of Visual Anomaly Detection (VAD) systems is hindered by their sensitivity to real-world imaging variations, particularly the complex interplay between viewpoint and illumination which drastically alters defect visibility. Current benchmarks largely overlook this critical challenge. We introduce Multi-View Multi-Illumination Anomaly Detection (M2AD), a new large-scale benc… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

    Comments: Homgepage: https://hustcyq.github.io/M2AD/. Yunkang Cao and Yuqi Cheng contribute equally to this work

  41. arXiv:2505.05279  [pdf, other

    cs.LG cs.CR cs.CV

    MTL-UE: Learning to Learn Nothing for Multi-Task Learning

    Authors: Yi Yu, Song Xia, Siyuan Yang, Chenqi Kong, Wenhan Yang, Shijian Lu, Yap-Peng Tan, Alex C. Kot

    Abstract: Most existing unlearnable strategies focus on preventing unauthorized users from training single-task learning (STL) models with personal data. Nevertheless, the paradigm has recently shifted towards multi-task data and multi-task learning (MTL), targeting generalist and foundation models that can handle multiple tasks simultaneously. Despite their growing importance, MTL data and models have been… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Accepted by ICML 2025

  42. arXiv:2505.04421  [pdf, other

    cs.IR

    LONGER: Scaling Up Long Sequence Modeling in Industrial Recommenders

    Authors: Zheng Chai, Qin Ren, Xijun Xiao, Huizhi Yang, Bo Han, Sijun Zhang, Di Chen, Hui Lu, Wenlin Zhao, Lele Yu, Xionghang Xie, Shiru Ren, Xiang Sun, Yaocheng Tan, Peng Xu, Yuchao Zheng, Di Wu

    Abstract: Modeling ultra-long user behavior sequences is critical for capturing both long- and short-term preferences in industrial recommender systems. Existing solutions typically rely on two-stage retrieval or indirect modeling paradigms, incuring upstream-downstream inconsistency and computational inefficiency. In this paper, we present LONGER, a Long-sequence Optimized traNsformer for GPU-Efficient Rec… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  43. arXiv:2505.03748  [pdf, ps, other

    cs.AR cs.AI

    APSQ: Additive Partial Sum Quantization with Algorithm-Hardware Co-Design

    Authors: Yonghao Tan, Pingcheng Dong, Yongkun Wu, Yu Liu, Xuejiao Liu, Peng Luo, Shih-Yang Liu, Xijie Huang, Dong Zhang, Luhong Liang, Kwang-Ting Cheng

    Abstract: DNN accelerators, significantly advanced by model compression and specialized dataflow techniques, have marked considerable progress. However, the frequent access of high-precision partial sums (PSUMs) leads to excessive memory demands in architectures utilizing input/weight stationary dataflows. Traditional compression strategies have typically overlooked PSUM quantization, which may account for… ▽ More

    Submitted 10 April, 2025; originally announced May 2025.

    Comments: 62nd ACM/IEEE Design Automation Conference (DAC) 2025

  44. arXiv:2505.03114  [pdf, other

    cs.CV

    Path and Bone-Contour Regularized Unpaired MRI-to-CT Translation

    Authors: Teng Zhou, Jax Luo, Yuping Sun, Yiheng Tan, Shun Yao, Nazim Haouchine, Scott Raymond

    Abstract: Accurate MRI-to-CT translation promises the integration of complementary imaging information without the need for additional imaging sessions. Given the practical challenges associated with acquiring paired MRI and CT scans, the development of robust methods capable of leveraging unpaired datasets is essential for advancing the MRI-to-CT translation. Current unpaired MRI-to-CT translation methods,… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  45. arXiv:2504.20829  [pdf, other

    cs.CV cs.AI

    GaussTrap: Stealthy Poisoning Attacks on 3D Gaussian Splatting for Targeted Scene Confusion

    Authors: Jiaxin Hong, Sixu Chen, Shuoyang Sun, Hongyao Yu, Hao Fang, Yuqi Tan, Bin Chen, Shuhan Qi, Jiawei Li

    Abstract: As 3D Gaussian Splatting (3DGS) emerges as a breakthrough in scene representation and novel view synthesis, its rapid adoption in safety-critical domains (e.g., autonomous systems, AR/VR) urgently demands scrutiny of potential security vulnerabilities. This paper presents the first systematic study of backdoor threats in 3DGS pipelines. We identify that adversaries may implant backdoor views to in… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  46. arXiv:2504.19362  [pdf, other

    eess.IV cs.AI cs.CV

    Low-Rank Adaptive Structural Priors for Generalizable Diabetic Retinopathy Grading

    Authors: Yunxuan Wang, Ray Yin, Yumei Tan, Hao Chen, Haiying Xia

    Abstract: Diabetic retinopathy (DR), a serious ocular complication of diabetes, is one of the primary causes of vision loss among retinal vascular diseases. Deep learning methods have been extensively applied in the grading of diabetic retinopathy (DR). However, their performance declines significantly when applied to data outside the training distribution due to domain shifts. Domain generalization (DG) ha… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: Accepted by IJCNN 2025

  47. arXiv:2504.18053  [pdf, ps, other

    cs.CL cs.CV

    DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models

    Authors: Jianyu Liu, Hangyu Guo, Ranjie Duan, Xingyuan Bu, Yancheng He, Shilong Li, Hui Huang, Jiaheng Liu, Yucheng Wang, Chenchen Jing, Xingwei Qu, Xiao Zhang, Yingshui Tan, Yanan Wu, Jihao Gu, Yangguang Li, Jianke Zhu

    Abstract: Multimodal Large Language Models (MLLMs) pose unique safety challenges due to their integration of visual and textual data, thereby introducing new dimensions of potential attacks and complex risk combinations. In this paper, we begin with a detailed analysis aimed at disentangling risks through step-by-step reasoning within multimodal inputs. We find that systematic multimodal risk disentanglemen… ▽ More

    Submitted 5 June, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

    Comments: [NAACL 2025] The first four authors contribute equally, 23 pages, repo at https://github.com/Kizna1ver/DREAM

  48. arXiv:2504.14866  [pdf, ps, other

    cs.AR cs.ET

    GainSight: Application-Guided Profiling for Composing Heterogeneous On-Chip Memories in AI Hardware Accelerators

    Authors: Peijing Li, Matthew Hung, Yiming Tan, Konstantin Hoßfeld, Jake Cheng Jiajun, Shuhan Liu, Lixian Yan, Xinxin Wang, H. -S. Philip Wong, Thierry Tambe

    Abstract: As AI workloads drive soaring memory requirements, higher-density on-chip memory is needed for domain-specific accelerators beyond what current SRAM technology can provide. We motivate that algorithms and application behavior should guide the composition of heterogeneous on-chip memories. However, little work has incorporated dynamic application profiles into these design decisions, and no existin… ▽ More

    Submitted 24 June, 2025; v1 submitted 21 April, 2025; originally announced April 2025.

    Comments: 16 pages, 10 figures

    ACM Class: B.7.1; B.3.1; C.3; I.6; I.2.6

  49. arXiv:2504.14541  [pdf, other

    cs.CR cs.CV cs.LG

    Towards Model Resistant to Transferable Adversarial Examples via Trigger Activation

    Authors: Yi Yu, Song Xia, Xun Lin, Chenqi Kong, Wenhan Yang, Shijian Lu, Yap-Peng Tan, Alex C. Kot

    Abstract: Adversarial examples, characterized by imperceptible perturbations, pose significant threats to deep neural networks by misleading their predictions. A critical aspect of these examples is their transferability, allowing them to deceive {unseen} models in black-box scenarios. Despite the widespread exploration of defense methods, including those on transferability, they show limitations: inefficie… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: Accepted by IEEE TIFS 2025

  50. arXiv:2504.10976  [pdf, other

    cs.CV

    Adaptive Decision Boundary for Few-Shot Class-Incremental Learning

    Authors: Linhao Li, Yongzhang Tan, Siyuan Yang, Hao Cheng, Yongfeng Dong, Liang Yang

    Abstract: Few-Shot Class-Incremental Learning (FSCIL) aims to continuously learn new classes from a limited set of training samples without forgetting knowledge of previously learned classes. Conventional FSCIL methods typically build a robust feature extractor during the base training session with abundant training samples and subsequently freeze this extractor, only fine-tuning the classifier in subsequen… ▽ More

    Submitted 17 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.