Skip to main content

Showing 1–50 of 8,061 results for author: Fan

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.21551  [pdf, ps, other

    cs.LG

    Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test

    Authors: Ziyue Li, Chenrui Fan, Tianyi Zhou

    Abstract: Grokking, i.e., test performance keeps improving long after training loss converged, has been recently witnessed in neural network training, making the mechanism of generalization and other emerging capabilities such as reasoning mysterious. While prior studies usually train small models on a few toy or highly-specific tasks for thousands of epochs, we conduct the first study of grokking on checkp… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  2. arXiv:2506.21539  [pdf, ps, other

    cs.RO cs.AI

    WorldVLA: Towards Autoregressive Action World Model

    Authors: Jun Cen, Chaohui Yu, Hangjie Yuan, Yuming Jiang, Siteng Huang, Jiayan Guo, Xin Li, Yibing Song, Hao Luo, Fan Wang, Deli Zhao, Hao Chen

    Abstract: We present WorldVLA, an autoregressive action world model that unifies action and image understanding and generation. Our WorldVLA intergrates Vision-Language-Action (VLA) model and world model in one single framework. The world model predicts future images by leveraging both action and image understanding, with the purpose of learning the underlying physics of the environment to improve action ge… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Code: https://github.com/alibaba-damo-academy/WorldVLA

  3. arXiv:2506.21513  [pdf, ps, other

    cs.CV

    GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation

    Authors: Wentao Hu, Shunkai Li, Ziqiao Peng, Haoxian Zhang, Fan Shi, Xiaoqiang Liu, Pengfei Wan, Di Zhang, Hui Tian

    Abstract: Creating high-quality, generalizable speech-driven 3D talking heads remains a persistent challenge. Previous methods achieve satisfactory results for fixed viewpoints and small-scale audio variations, but they struggle with large head rotations and out-of-distribution (OOD) audio. Moreover, they are constrained by the need for time-consuming, identity-specific training. We believe the core issue l… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: ICCV 2025, Project page: https://vincenthu19.github.io/GGTalker/

  4. arXiv:2506.21414  [pdf, ps, other

    cs.AR

    Accelerating GNN Training through Locality-aware Dropout and Merge

    Authors: Gongjian Sun, Mingyu Yan, Dengke Han, Runzhen Xue, Duo Wang, Xiaochun Ye, Dongrui Fan

    Abstract: Graph Neural Networks (GNNs) have demonstrated significant success in graph learning and are widely adopted across various critical domains. However, the irregular connectivity between vertices leads to inefficient neighbor aggregation, resulting in substantial irregular and coarse-grained DRAM accesses. This lack of data locality presents significant challenges for execution platforms, ultimately… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: under review in TPDS. extend version of DATE 2025

  5. arXiv:2506.21356  [pdf, ps, other

    cs.CV

    ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models

    Authors: Hongbo Liu, Jingwen He, Yi Jin, Dian Zheng, Yuhao Dong, Fan Zhang, Ziqi Huang, Yinan He, Yangguang Li, Weichao Chen, Yu Qiao, Wanli Ouyang, Shengjie Zhao, Ziwei Liu

    Abstract: Cinematography, the fundamental visual language of film, is essential for conveying narrative, emotion, and aesthetic quality. While recent Vision-Language Models (VLMs) demonstrate strong general visual understanding, their proficiency in comprehending the nuanced cinematic grammar embedded within individual shots remains largely unexplored and lacks robust evaluation. This critical gap limits bo… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  6. arXiv:2506.21285  [pdf, ps, other

    cs.CL

    Double-Checker: Enhancing Reasoning of Slow-Thinking LLMs via Self-Critical Fine-Tuning

    Authors: Xin Xu, Tianhao Chen, Fan Zhang, Wanlong Liu, Pengxiang Li, Ajay Kumar Jaiswal, Yuchen Yan, Jishan Hu, Yang Wang, Hao Chen, Shiwei Liu, Shizhe Diao, Can Yang, Lu Yin

    Abstract: While slow-thinking large language models (LLMs) exhibit reflection-like reasoning, commonly referred to as the "aha moment:, their ability to generate informative critiques and refine prior solutions remains limited. In this paper, we introduce Double-Checker, a principled framework designed to enhance the reasoning capabilities of slow-thinking LLMs by fostering explicit self-critique and iterat… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: 10 pages

  7. arXiv:2506.21269  [pdf, ps, other

    cs.SD cs.AI eess.AS

    Integrating Vehicle Acoustic Data for Enhanced Urban Traffic Management: A Study on Speed Classification in Suzhou

    Authors: Pengfei Fan, Yuli Zhang, Xinheng Wang, Ruiyuan Jiang, Hankang Gu, Dongyao Jia, Shangbo Wang

    Abstract: This study presents and publicly releases the Suzhou Urban Road Acoustic Dataset (SZUR-Acoustic Dataset), which is accompanied by comprehensive data-acquisition protocols and annotation guidelines to ensure transparency and reproducibility of the experimental workflow. To model the coupling between vehicular noise and driving speed, we propose a bimodal-feature-fusion deep convolutional neural net… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  8. arXiv:2506.21093  [pdf, ps, other

    cs.LG cs.IT eess.SP stat.ML

    Chain-of-Thought Enhanced Shallow Transformers for Wireless Symbol Detection

    Authors: Li Fan, Peng Wang, Jing Yang, Cong Shen

    Abstract: Transformers have shown potential in solving wireless communication problems, particularly via in-context learning (ICL), where models adapt to new tasks through prompts without requiring model updates. However, prior ICL-based Transformer models rely on deep architectures with many layers to achieve satisfactory performance, resulting in substantial storage and computational costs. In this work,… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  9. arXiv:2506.20702  [pdf

    cs.AI cs.CY

    The Singapore Consensus on Global AI Safety Research Priorities

    Authors: Yoshua Bengio, Tegan Maharaj, Luke Ong, Stuart Russell, Dawn Song, Max Tegmark, Lan Xue, Ya-Qin Zhang, Stephen Casper, Wan Sie Lee, Sören Mindermann, Vanessa Wilfred, Vidhisha Balachandran, Fazl Barez, Michael Belinsky, Imane Bello, Malo Bourgon, Mark Brakel, Siméon Campos, Duncan Cass-Beggs, Jiahao Chen, Rumman Chowdhury, Kuan Chua Seah, Jeff Clune, Juntao Dai , et al. (61 additional authors not shown)

    Abstract: Rapidly improving AI capabilities and autonomy hold significant promise of transformation, but are also driving vigorous debate on how to ensure that AI is safe, i.e., trustworthy, reliable, and secure. Building a trusted ecosystem is therefore essential -- it helps people embrace AI with confidence and gives maximal space for innovation while avoiding backlash. The "2025 Singapore Conference on… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Final report from the "2025 Singapore Conference on AI (SCAI)" held April 26: https://www.scai.gov.sg/2025/scai2025-report

  10. arXiv:2506.20644  [pdf, ps, other

    cs.LG

    Efficient Federated Learning with Encrypted Data Sharing for Data-Heterogeneous Edge Devices

    Authors: Hangyu Li, Hongyue Wu, Guodong Fan, Zhen Zhang, Shizhan Chen, Zhiyong Feng

    Abstract: As privacy protection gains increasing importance, more models are being trained on edge devices and subsequently merged into the central server through Federated Learning (FL). However, current research overlooks the impact of network topology, physical distance, and data heterogeneity on edge devices, leading to issues such as increased latency and degraded model performance. To address these is… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Accepted by ICWS 2025

  11. arXiv:2506.20512  [pdf, ps, other

    cs.CL cs.AI cs.LG

    OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

    Authors: Zengzhi Wang, Fan Zhou, Xuefeng Li, Pengfei Liu

    Abstract: Different base language model families, such as Llama and Qwen, exhibit divergent behaviors during post-training with reinforcement learning (RL), especially on reasoning-intensive tasks. What makes a base language model suitable for reinforcement learning? Gaining deeper insight into this question is essential for developing RL-scalable foundation models of the next generation. In this work, we i… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 26 pages; The first three authors contribute to this work equally

  12. arXiv:2506.20430  [pdf, ps, other

    cs.CL cs.AI cs.CV cs.MA

    An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

    Authors: Weike Zhao, Chaoyi Wu, Yanjie Fan, Xiaoman Zhang, Pengcheng Qiu, Yuze Sun, Xiao Zhou, Yanfeng Wang, Ya Zhang, Yongguo Yu, Kun Sun, Weidi Xie

    Abstract: Rare diseases collectively affect over 300 million individuals worldwide, yet timely and accurate diagnosis remains a pervasive challenge. This is largely due to their clinical heterogeneity, low individual prevalence, and the limited familiarity most clinicians have with rare conditions. Here, we introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  13. arXiv:2506.20373  [pdf, ps, other

    cs.RO cs.AI cs.HC

    CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition

    Authors: Joerg Deigmoeller, Stephan Hasler, Nakul Agarwal, Daniel Tanneberg, Anna Belardinelli, Reza Ghoddoosian, Chao Wang, Felix Ocker, Fan Zhang, Behzad Dariush, Michael Gienger

    Abstract: We introduce CARMA, a system for situational grounding in human-robot group interactions. Effective collaboration in such group settings requires situational awareness based on a consistent representation of present persons and objects coupled with an episodic abstraction of events regarding actors and manipulated objects. This calls for a clear and consistent assignment of instances, ensuring tha… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

  14. Semantic-enhanced Modality-asymmetric Retrieval for Online E-commerce Search

    Authors: Zhigong Zhou, Ning Ding, Xiaochuan Fan, Yue Shang, Yiming Qiu, Jingwei Zhuo, Zhiwei Ge, Songlin Wang, Lin Liu, Sulong Xu, Han Zhang

    Abstract: Semantic retrieval, which retrieves semantically matched items given a textual query, has been an essential component to enhance system effectiveness in e-commerce search. In this paper, we study the multimodal retrieval problem, where the visual information (e.g, image) of item is leveraged as supplementary of textual information to enrich item representation and further improve retrieval perform… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: published in sigir2023

  15. arXiv:2506.20282  [pdf, ps, other

    eess.IV cs.CV

    Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration

    Authors: Jiaxing Huang, Heng Guo, Le Lu, Fan Yang, Minfeng Xu, Ge Yang, Wei Luo

    Abstract: Osteoporosis, characterized by reduced bone mineral density (BMD) and compromised bone microstructure, increases fracture risk in aging populations. While dual-energy X-ray absorptiometry (DXA) is the clinical standard for BMD assessment, its limited accessibility hinders diagnosis in resource-limited regions. Opportunistic computed tomography (CT) analysis has emerged as a promising alternative f… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Accepted by MICCAI 2025

  16. arXiv:2506.20241  [pdf, ps, other

    cs.CL cs.AI

    Enhancing Large Language Models through Structured Reasoning

    Authors: Yubo Dong, Hehe Fan

    Abstract: Recent Large Language Models (LLMs) have significantly advanced natural language processing and automated decision-making. However, these models still encounter difficulties when performing complex reasoning tasks involving logical deduction and systematic planning, primarily due to their reliance on implicit statistical relationships without structured knowledge representation.Inspired by cogniti… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: Preprint. Under review

  17. arXiv:2506.20151  [pdf, ps, other

    cs.CV cs.AI

    EAR: Erasing Concepts from Unified Autoregressive Models

    Authors: Haipeng Fan, Shiyuan Zhang, Baohunesitu, Zihang Guo, Huaiwen Zhang

    Abstract: Autoregressive (AR) models have achieved unified and strong performance across both visual understanding and image generation tasks. However, removing undesired concepts from AR models while maintaining overall generation quality remains an open challenge. In this paper, we propose Erasure Autoregressive Model (EAR), a fine-tuning method for effective and utility-preserving concept erasure in AR m… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 11 pages, 7 figures, 1 tables

  18. arXiv:2506.19885  [pdf, ps, other

    cs.LG cs.AI eess.SY

    FlightKooba: A Fast Interpretable FTP Model

    Authors: Jing Lu, Xuan Wu, Yizhun Tian, Songhan Fan, Yali Fang

    Abstract: The Koopman theory is a powerful and effective modeling tool for converting nonlinear systems into linear representations, and flight trajectory prediction (FTP) is a complex nonlinear system. However, current models applying the Koopman theory to FTP tasks are not very effective, model interpretability is indeed an issue, and the Koopman operators are computationally intensive, resulting in long… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: 7 figures

  19. arXiv:2506.19884  [pdf, ps, other

    cs.OS cs.AI cs.PF cs.SE

    MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection

    Authors: Zhengxiang Huang, Chaoyue Niu, Zhaode Wang, Jiarui Xue, Hanming Zhang, Yugang Wang, Zewei Xin, Xiaotang Jiang, Chengfei Lv, Fan Wu, Guihai Chen

    Abstract: As the demand for on-device Large Language Model (LLM) inference grows, energy efficiency has become a major concern, especially for battery-limited mobile devices. Our analysis shows that the memory-bound LLM decode phase dominates energy use, and yet most existing works focus on accelerating the prefill phase, neglecting energy concerns. We introduce Adaptive Energy-Centric Core Selection (AECS)… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  20. arXiv:2506.19802  [pdf, ps, other

    cs.CR cs.IR

    KnowML: Improving Generalization of ML-NIDS with Attack Knowledge Graphs

    Authors: Xin Fan Guo, Albert Merono Penuela, Sergio Maffeis, Fabio Pierazzi

    Abstract: Despite extensive research on Machine Learning-based Network Intrusion Detection Systems (ML-NIDS), their capability to detect diverse attack variants remains uncertain. Prior studies have largely relied on homogeneous datasets, which artificially inflate performance scores and offer a false sense of security. Designing systems that can effectively detect a wide range of attack variants remains a… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  21. arXiv:2506.19467  [pdf, ps, other

    cs.CL cs.AI

    Can Large Language Models Capture Human Annotator Disagreements?

    Authors: Jingwei Ni, Yu Fan, Vilém Zouhar, Donya Rooein, Alexander Hoyle, Mrinmaya Sachan, Markus Leippold, Dirk Hovy, Elliott Ash

    Abstract: Human annotation variation (i.e., annotation disagreements) is common in NLP and often reflects important information such as task subjectivity and sample ambiguity. While Large Language Models (LLMs) are increasingly used for automatic annotation to reduce human effort, their evaluation often focuses on predicting the majority-voted "ground truth" labels. It is still unclear, however, whether the… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: Preprint Under Review

  22. arXiv:2506.19425  [pdf, ps, other

    cs.SE

    What Makes the Best Decomposition? Investigating Binary Decomposition Under FCG Variance

    Authors: Ang Jia, He Jiang, Zhilei Ren, Xiaochen Li, Ming Fan, Ting Liu

    Abstract: Binary decomposition, which decomposes binary files into modules, plays a critical role in binary reuse detection. Existing binary decomposition works either apply anchor-based methods by extending anchor functions to generate modules, or apply clustering-based methods by using clustering algorithms to group binary functions, which all rely on that reused code shares similar function call relation… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  23. arXiv:2506.19324  [pdf, ps, other

    cs.CV

    Memory-Augmented Incomplete Multimodal Survival Prediction via Cross-Slide and Gene-Attentive Hypergraph Learning

    Authors: Mingcheng Qu, Guang Yang, Donglin Di, Yue Gao, Tonghua Su, Yang Song, Lei Fan

    Abstract: Multimodal pathology-genomic analysis is critical for cancer survival prediction. However, existing approaches predominantly integrate formalin-fixed paraffin-embedded (FFPE) slides with genomic data, while neglecting the availability of other preservation slides, such as Fresh Froze (FF) slides. Moreover, as the high-resolution spatial nature of pathology data tends to dominate the cross-modality… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

    Comments: accepted by MICCAI2025 code: https://github.com/MCPathology/M2Surv

  24. arXiv:2506.19300  [pdf, ps, other

    cs.CV

    Open-Vocabulary Camouflaged Object Segmentation with Cascaded Vision Language Models

    Authors: Kai Zhao, Wubang Yuan, Zheng Wang, Guanyi Li, Xiaoqiang Zhu, Deng-ping Fan, Dan Zeng

    Abstract: Open-Vocabulary Camouflaged Object Segmentation (OVCOS) seeks to segment and classify camouflaged objects from arbitrary categories, presenting unique challenges due to visual ambiguity and unseen categories.Recent approaches typically adopt a two-stage paradigm: first segmenting objects, then classifying the segmented regions using Vision Language Models (VLMs).However, these methods (1) suffer f… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  25. arXiv:2506.19269  [pdf, ps, other

    cs.RO cs.AI

    AnchorDP3: 3D Affordance Guided Sparse Diffusion Policy for Robotic Manipulation

    Authors: Ziyan Zhao, Ke Fan, He-Yang Xu, Ning Qiao, Bo Peng, Wenlong Gao, Dongjiang Li, Hui Shen

    Abstract: We present AnchorDP3, a diffusion policy framework for dual-arm robotic manipulation that achieves state-of-the-art performance in highly randomized environments. AnchorDP3 integrates three key innovations: (1) Simulator-Supervised Semantic Segmentation, using rendered ground truth to explicitly segment task-critical objects within the point cloud, which provides strong affordance priors; (2) Task… ▽ More

    Submitted 25 June, 2025; v1 submitted 23 June, 2025; originally announced June 2025.

  26. arXiv:2506.19171  [pdf, ps, other

    cs.LG

    Distilling Tool Knowledge into Language Models via Back-Translated Traces

    Authors: Xingyue Huang, Xianglong Hu, Zifeng Ding, Yuan He, Rishabh, Waleed Alzarooni, Ziyu Ye, Wendong Fan, Bailan He, Haige Bo, Changran Hu, Guohao Li

    Abstract: Large language models (LLMs) often struggle with mathematical problems that require exact computation or multi-step algebraic reasoning. Tool-integrated reasoning (TIR) offers a promising solution by leveraging external tools such as code interpreters to ensure correctness, but it introduces inference-time dependencies that hinder scalability and deployment. In this work, we propose a new paradigm… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Accepted in Workshop in Multi-Agent Systems in the Era of Foundation Models: Opportunities, Challenges and Futures, ICML 2025

  27. arXiv:2506.18939  [pdf, ps, other

    cs.CV cs.AI

    Damba-ST: Domain-Adaptive Mamba for Efficient Urban Spatio-Temporal Prediction

    Authors: Rui An, Yifeng Zhang, Ziran Liang, Wenqi Fan, Yuxuan Liang, Xuequn Shang, Qing Li

    Abstract: Training urban spatio-temporal foundation models that generalize well across diverse regions and cities is critical for deploying urban services in unseen or data-scarce regions. Recent studies have typically focused on fusing cross-domain spatio-temporal data to train unified Transformer-based models. However, these models suffer from quadratic computational complexity and high memory overhead, l… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  28. arXiv:2506.18904  [pdf, ps, other

    cs.CV

    TC-Light: Temporally Consistent Relighting for Dynamic Long Videos

    Authors: Yang Liu, Chuanchen Luo, Zimo Tang, Yingyan Li, Yuran Yang, Yuanyong Ning, Lue Fan, Junran Peng, Zhaoxiang Zhang

    Abstract: Editing illumination in long videos with complex dynamics has significant value in various downstream tasks, including visual content creation and manipulation, as well as data scaling up for embodied AI through sim2real and real2real transfer. Nevertheless, existing video relighting techniques are predominantly limited to portrait videos or fall into the bottleneck of temporal consistency and com… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: Project Page: https://dekuliutesla.github.io/tclight/ Code: https://github.com/Linketic/TC-Light

  29. arXiv:2506.18679  [pdf, ps, other

    cs.CV

    MARL-MambaContour: Unleashing Multi-Agent Deep Reinforcement Learning for Active Contour Optimization in Medical Image Segmentation

    Authors: Ruicheng Zhang, Yu Sun, Zeyu Zhang, Jinai Li, Xiaofan Liu, Au Hoi Fan, Haowei Guo, Puxin Yan

    Abstract: We introduce MARL-MambaContour, the first contour-based medical image segmentation framework based on Multi-Agent Reinforcement Learning (MARL). Our approach reframes segmentation as a multi-agent cooperation task focused on generate topologically consistent object-level contours, addressing the limitations of traditional pixel-based methods which could lack topological constraints and holistic st… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  30. arXiv:2506.18533  [pdf, ps, other

    cs.CV

    Geometry-aware Distance Measure for Diverse Hierarchical Structures in Hyperbolic Spaces

    Authors: Pengxiang Li, Yuwei Wu, Zhi Gao, Xiaomeng Fan, Wei Wu, Zhipeng Lu, Yunde Jia, Mehrtash Harandi

    Abstract: Learning in hyperbolic spaces has attracted increasing attention due to its superior ability to model hierarchical structures of data. Most existing hyperbolic learning methods use fixed distance measures for all data, assuming a uniform hierarchy across all data points. However, real-world hierarchical structures exhibit significant diversity, making this assumption overly restrictive. In this pa… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 24 pages

  31. arXiv:2506.18529  [pdf, ps, other

    cs.CV cs.LG

    A Set-to-Set Distance Measure in Hyperbolic Space

    Authors: Pengxiang Li, Wei Wu, Zhi Gao, Xiaomeng Fan, Peilin Yu, Yuwei Wu, Zhipeng Lu, Yunde Jia, Mehrtash Harandi

    Abstract: We propose a hyperbolic set-to-set distance measure for computing dissimilarity between sets in hyperbolic space. While point-to-point distances in hyperbolic space effectively capture hierarchical relationships between data points, many real-world applications require comparing sets of hyperbolic data points, where the local structure and the global structure of the sets carry crucial semantic in… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 24 pages

  32. arXiv:2506.18398  [pdf, ps, other

    cs.SE

    Your Token Becomes Worthless: Unveiling Rug Pull Schemes in Crypto Token via Code-and-Transaction Fusion Analysis

    Authors: Hao Wu, Haijun Wang, Shangwang Li, Yin Wu, Ming Fan, Wuxia Jin, Yitao Zhao, Ting Liu

    Abstract: Rug pull scams have emerged as a persistent threat to cryptocurrency, causing significant financial losses. A typical scenario involves scammers deploying honeypot contracts to attract investments, restricting token sales, and draining the funds, which leaves investors with worthless tokens. Current methods either rely on predefined patterns to detect code risks or utilize statistical transaction… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  33. arXiv:2506.18348  [pdf, ps, other

    cs.AI

    Dynamic Knowledge Exchange and Dual-diversity Review: Concisely Unleashing the Potential of a Multi-Agent Research Team

    Authors: Weilun Yu, Shixiang Tang, Yonggui Huang, Nanqing Dong, Li Fan, Honggang Qi, Wei Liu, Xiaoli Diao, Xi Chen, Wanli Ouyang

    Abstract: Scientific progress increasingly relies on effective collaboration among researchers, a dynamic that large language models (LLMs) have only begun to emulate. While recent LLM-based scientist agents show promise in autonomous scientific discovery, they often lack the interactive reasoning and evaluation mechanisms essential to real-world research. We propose IDVSCI (Internal Discussion and Vote SCI… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  34. arXiv:2506.18317  [pdf, ps, other

    cs.HC cs.NI cs.RO

    Crowdsourcing Ubiquitous Indoor Localization with Non-Cooperative Wi-Fi Ranging

    Authors: Emerson Sie, Enguang Fan, Federico Cifuentes-Urtubey, Deepak Vasisht

    Abstract: Indoor localization opens the path to potentially transformative applications. Although many indoor localization methods have been proposed over the years, they remain too impractical for widespread deployment in the real world. In this paper, we introduce PeepLoc, a deployable and scalable Wi-Fi-based solution for indoor localization that relies only on pre-existing devices and infrastructure. Sp… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  35. arXiv:2506.18304  [pdf

    cs.LG cs.AI

    Sharpening the Spear: Adaptive Expert-Guided Adversarial Attack Against DRL-based Autonomous Driving Policies

    Authors: Junchao Fan, Xuyang Lei, Xiaolin Chang

    Abstract: Deep reinforcement learning (DRL) has emerged as a promising paradigm for autonomous driving. However, despite their advanced capabilities, DRL-based policies remain highly vulnerable to adversarial attacks, posing serious safety risks in real-world deployments. Investigating such attacks is crucial for revealing policy vulnerabilities and guiding the development of more robust autonomous systems.… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

    Comments: 12 pages, 3 figures, 2 tables

  36. arXiv:2506.18290  [pdf, ps, other

    cs.LG

    Instability in Diffusion ODEs: An Explanation for Inaccurate Image Reconstruction

    Authors: Han Zhang, Jinghong Mao, Shangwen Zhu, Zhantao Yang, Lianghua Huang, Yu Liu, Deli Zhao, Ruili Feng, Fan Cheng

    Abstract: Diffusion reconstruction plays a critical role in various applications such as image editing, restoration, and style transfer. In theory, the reconstruction should be simple - it just inverts and regenerates images by numerically solving the Probability Flow-Ordinary Differential Equation (PF-ODE). Yet in practice, noticeable reconstruction errors have been observed, which cannot be well explained… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  37. arXiv:2506.18270  [pdf

    eess.IV cs.CV

    Adaptive Mask-guided K-space Diffusion for Accelerated MRI Reconstruction

    Authors: Qinrong Cai, Yu Guan, Zhibo Chen, Dong Liang, Qiuyun Fan, Qiegen Liu

    Abstract: As the deep learning revolution marches on, masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked during training, and has demonstrated exceptional performance in multiple fields. Magnetic Resonance Imaging (MRI) reconstruction is a critical task in medical imaging that seeks to recover high-quality images from unde… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 10 pages, 9 figures

  38. arXiv:2506.18251  [pdf, ps, other

    cs.GR cs.AI cs.CV

    Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models

    Authors: Chao Li, Jiawei Fan, Anbang Yao

    Abstract: In this paper, we present Morse, a simple dual-sampling framework for accelerating diffusion models losslessly. The key insight of Morse is to reformulate the iterative generation (from noise to data) process via taking advantage of fast jump sampling and adaptive residual feedback strategies. Specifically, Morse involves two models called Dash and Dot that interact with each other. The Dash model… ▽ More

    Submitted 24 June, 2025; v1 submitted 22 June, 2025; originally announced June 2025.

    Comments: Fixed a prompt typo in Figure 18 of the Appendix. This work is accepted to ICML 2025. The project page: https://github.com/deep-optimization/Morse

  39. arXiv:2506.18226  [pdf, ps, other

    cs.CV cs.AI

    Make It Efficient: Dynamic Sparse Attention for Autoregressive Image Generation

    Authors: Xunzhi Xiang, Qi Fan

    Abstract: Autoregressive conditional image generation models have emerged as a dominant paradigm in text-to-image synthesis. These methods typically convert images into one-dimensional token sequences and leverage the self-attention mechanism, which has achieved remarkable success in natural language processing, to capture long-range dependencies, model global context, and ensure semantic coherence. However… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  40. arXiv:2506.17963  [pdf, ps, other

    q-bio.BM cs.AI

    OmniESI: A unified framework for enzyme-substrate interaction prediction with progressive conditional deep learning

    Authors: Zhiwei Nie, Hongyu Zhang, Hao Jiang, Yutian Liu, Xiansong Huang, Fan Xu, Jie Fu, Zhixiang Ren, Yonghong Tian, Wen-Bin Zhang, Jie Chen

    Abstract: Understanding and modeling enzyme-substrate interactions is crucial for catalytic mechanism research, enzyme engineering, and metabolic engineering. Although a large number of predictive methods have emerged, they do not incorporate prior knowledge of enzyme catalysis to rationally modulate general protein-molecule features that are misaligned with catalytic patterns. To address this issue, we int… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  41. arXiv:2506.17645  [pdf, ps, other

    cs.CV

    Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning

    Authors: Shih-Wen Liu, Hsuan-Yu Fan, Wei-Ta Chu, Fu-En Yang, Yu-Chiang Frank Wang

    Abstract: Automating medical report generation from histopathology images is a critical challenge requiring effective visual representations and domain-specific knowledge. Inspired by the common practices of human experts, we propose an in-context learning framework called PathGenIC that integrates context derived from the training set with a multimodal in-context learning (ICL) mechanism. Our method dynami… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: Accepted to MIDL 2025

  42. arXiv:2506.17559  [pdf, ps, other

    cs.IT eess.SP

    Joint Transmission for Cellular Networks with Pinching Antennas: System Design and Analysis

    Authors: Enzhi Zhou, Jingjing Cui, Ziyue Liu, Zhiguo Ding, Pingzhi Fan

    Abstract: As an emerging flexible antenna technology for wireless communications, pinching-antenna systems, offer distinct advantages in terms of cost efficiency and deployment flexibility. This paper investigates joint transmission strategies of the base station (BS) and pinching antennas (PAS), focusing specifically on how to cooperate efficiently between the BS and waveguide-mounted pinching antennas for… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  43. arXiv:2506.17368  [pdf, ps, other

    cs.LG cs.AI cs.CR

    SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification

    Authors: Zhenglin Lai, Mengyao Liao, Dong Xu, Zebin Zhao, Zhihang Yuan, Chao Fan, Jianqiang Li, Bingzhe Wu

    Abstract: Large language models based on Mixture-of-Experts have achieved substantial gains in efficiency and scalability, yet their architectural uniqueness introduces underexplored safety alignment challenges. Existing safety alignment strategies, predominantly designed for dense models, are ill-suited to address MoE-specific vulnerabilities. In this work, we formalize and systematically study MoE model's… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: 9 pages, 7 figures

  44. arXiv:2506.17296  [pdf, ps, other

    cs.CL cs.AI

    Semantic uncertainty in advanced decoding methods for LLM generation

    Authors: Darius Foodeei, Simin Fan, Martin Jaggi

    Abstract: This study investigates semantic uncertainty in large language model (LLM) outputs across different decoding methods, focusing on emerging techniques like speculative sampling and chain-of-thought (CoT) decoding. Through experiments on question answering, summarization, and code generation tasks, we analyze how different decoding strategies affect both the diversity and reliability of model output… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  45. arXiv:2506.17267  [pdf, ps, other

    cs.LG cs.AI

    CF-VLM:CounterFactual Vision-Language Fine-tuning

    Authors: Jusheng Zhang, Kaitong Cai, Yijia Fan, Jian Wang, Keze Wang

    Abstract: Recent advances in vision-language models (VLMs) have greatly improved cross-modal semantic understanding, yet significant limitations remain in fine-grained discrimination and deep causal reasoning tasks. Existing VLMs often rely on superficial statistical correlations, lacking the ability to capture the underlying causal logic between visual and textual content. To address this, we propose Count… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  46. arXiv:2506.17137  [pdf, ps, other

    cs.CV

    On the Theory of Conditional Feature Alignment for Unsupervised Domain-Adaptive Counting

    Authors: Zhuonan Liang, Dongnan Liu, Jianan Fan, Yaxuan Song, Qiang Qu, Yu Yao, Peng Fu, Weidong Cai

    Abstract: Object counting models suffer when deployed across domains with differing density variety, since density shifts are inherently task-relevant and violate standard domain adaptation assumptions. To address this, we propose a theoretical framework of conditional feature alignment. We first formalize the notion of conditional divergence by partitioning each domain into subsets (e.g., object vs. backgr… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: 18 pages, 5 figures, 8 tables

  47. arXiv:2506.17114  [pdf, ps, other

    cs.AI

    Mathematical Proof as a Litmus Test: Revealing Failure Modes of Advanced Large Reasoning Models

    Authors: Dadi Guo, Jiayu Liu, Zhiyuan Fan, Zhitao He, Haoran Li, Yumeng Wang, Yi R. Fung

    Abstract: Large reasoning models (e.g., R1, o3) have demonstrated remarkable mathematical problem-solving abilities. However, the high reported accuracy of these advanced models on popular datasets, reliance on purely numerical evaluation and potential benchmark leakage, often masks their true reasoning shortcomings. To address this, we propose leveraging the inherent rigor and methodological complexity of… ▽ More

    Submitted 23 June, 2025; v1 submitted 20 June, 2025; originally announced June 2025.

  48. Pyramid Mixer: Multi-dimensional Multi-period Interest Modeling for Sequential Recommendation

    Authors: Zhen Gong, Zhifang Fan, Hui Lu, Qiwei Chen, Chenbin Zhang, Lin Guan, Yuchao Zheng, Feng Zhang, Xiao Yang, Zuotao Liu

    Abstract: Sequential recommendation, a critical task in recommendation systems, predicts the next user action based on the understanding of the user's historical behaviors. Conventional studies mainly focus on cross-behavior modeling with self-attention based methods while neglecting comprehensive user interest modeling for more dimensions. In this study, we propose a novel sequential recommendation model,… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: Accepted by SIGIR'25

  49. arXiv:2506.16931  [pdf, ps, other

    cs.AI cs.RO

    Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning

    Authors: Jiaqi Chen, Mingfeng Fan, Xuefeng Zhang, Jingsong Liang, Yuhong Cao, Guohua Wu, Guillaume Adrien Sartoretti

    Abstract: Effective and efficient task planning is essential for mobile robots, especially in applications like warehouse retrieval and environmental monitoring. These tasks often involve selecting one location from each of several target clusters, forming a Generalized Traveling Salesman Problem (GTSP) that remains challenging to solve both accurately and efficiently. To address this, we propose a Multimod… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: 14 pages, 6 figures, under review

  50. arXiv:2506.16874  [pdf, ps, other

    cs.HC

    Exploring the Usage of Generative AI for Group Project-Based Offline Art Courses in Elementary Schools

    Authors: Zhiqing Wang, Haoxiang Fan, Shiwei Wu, Qiaoyi Chen, Yongqi Liang, Zhenhui Peng

    Abstract: The integration of Generative Artificial Intelligence (GenAI) in K-6 project-based art courses presents both opportunities and challenges for enhancing creativity, engagement, and group collaboration. This study introduces a four-phase field study, involving in total two experienced K-6 art teachers and 132 students in eight offline course sessions, to investigate the usage and impact of GenAI. Sp… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.