Skip to main content

Showing 1–50 of 457 results for author: Fan, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04631  [pdf, ps, other

    cs.CV cs.AI cs.RO

    Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts

    Authors: Yun Wang, Longguang Wang, Chenghao Zhang, Yongjian Zhang, Zhanjie Zhang, Ao Ma, Chenyou Fan, Tin Lun Lam, Junjie Hu

    Abstract: Recently, learning-based stereo matching networks have advanced significantly. However, they often lack robustness and struggle to achieve impressive cross-domain performance due to domain shifts and imbalanced disparity distributions among diverse datasets. Leveraging Vision Foundation Models (VFMs) can intuitively enhance the model's robustness, but integrating such a model into stereo matching… ▽ More

    Submitted 6 July, 2025; originally announced July 2025.

    Journal ref: ICCV 2025

  2. arXiv:2507.03430  [pdf, ps, other

    cs.LG cs.AI

    Multi-Level Fusion Graph Neural Network for Molecule Property Prediction

    Authors: XiaYu Liu, Hou-biao Li, Yang Liu, Chao Fan

    Abstract: Accurate molecular property prediction is essential in drug discovery and related fields. However, existing graph neural networks (GNNs) often struggle to simultaneously capture both local and global molecular structures. In this work, we propose a Multi-Level Fusion Graph Neural Network (MLFGNN) that integrates Graph Attention Networks and a novel Graph Transformer to jointly model local and glob… ▽ More

    Submitted 4 July, 2025; originally announced July 2025.

    Comments: 38 pages, 11 figures, 6 tables

    MSC Class: 68T07 ACM Class: I.2.6

  3. arXiv:2507.02379  [pdf

    cs.AI q-bio.BM

    An AI-native experimental laboratory for autonomous biomolecular engineering

    Authors: Mingyu Wu, Zhaoguo Wang, Jiabin Wang, Zhiyuan Dong, Jingkai Yang, Qingting Li, Tianyu Huang, Lei Zhao, Mingqiang Li, Fei Wang, Chunhai Fan, Haibo Chen

    Abstract: Autonomous scientific research, capable of independently conducting complex experiments and serving non-specialists, represents a long-held aspiration. Achieving it requires a fundamental paradigm shift driven by artificial intelligence (AI). While autonomous experimental systems are emerging, they remain confined to areas featuring singular objectives and well-defined, simple experimental workflo… ▽ More

    Submitted 3 July, 2025; originally announced July 2025.

  4. arXiv:2507.00505  [pdf, ps, other

    cs.CV

    LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs

    Authors: Haoran Lou, Chunxiao Fan, Ziyan Liu, Yuexin Wu, Xinliang Wang

    Abstract: The architecture of multimodal large language models (MLLMs) commonly connects a vision encoder, often based on CLIP-ViT, to a large language model. While CLIP-ViT works well for capturing global image features, it struggles to model local relationships between adjacent patches, leading to weaker visual representation, which in turn affects the detailed understanding ability of MLLMs. To solve thi… ▽ More

    Submitted 4 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

    Comments: Accepted to ICCV 2025

  5. arXiv:2506.22813  [pdf, ps, other

    cs.CL

    Selecting and Merging: Towards Adaptable and Scalable Named Entity Recognition with Large Language Models

    Authors: Zhuojun Ding, Wei Wei, Chenghao Fan

    Abstract: Supervised fine-tuning (SFT) is widely used to align large language models (LLMs) with information extraction (IE) tasks, such as named entity recognition (NER). However, annotating such fine-grained labels and training domain-specific models is costly. Existing works typically train a unified model across multiple domains, but such approaches lack adaptation and scalability since not all training… ▽ More

    Submitted 28 June, 2025; originally announced June 2025.

  6. arXiv:2506.21669  [pdf, ps, other

    cs.AI

    SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents

    Authors: Wanxin Tian, Shijie Zhang, Kevin Zhang, Xiaowei Chi, Yulin Luo, Junyu Lu, Chunkai Fan, Qiang Zhou, Yiming Zhao, Ning Liu Siyu Lin, Zhiyuan Qin, Xiaozhu Ju, Shanghang Zhang, Jian Tang

    Abstract: Self-evolution, the ability of agents to autonomously improve their reasoning and behavior, is essential for the embodied domain with long-horizon, real-world tasks. Despite current advancements in reinforcement fine-tuning (RFT) showing strong performance in enhancing reasoning in LLMs, its potential to enable self-evolving embodied intelligence with multi-modal interactions remains largely unexp… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

  7. arXiv:2506.21551  [pdf, ps, other

    cs.LG

    Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test

    Authors: Ziyue Li, Chenrui Fan, Tianyi Zhou

    Abstract: Grokking, i.e., test performance keeps improving long after training loss converged, has been recently witnessed in neural network training, making the mechanism of generalization and other emerging capabilities such as reasoning mysterious. While prior studies usually train small models on a few toy or highly-specific tasks for thousands of epochs, we conduct the first study of grokking on checkp… ▽ More

    Submitted 2 July, 2025; v1 submitted 26 June, 2025; originally announced June 2025.

    Comments: 10 pages, 8 figures

  8. arXiv:2506.17368  [pdf, ps, other

    cs.LG cs.AI cs.CR

    SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification

    Authors: Zhenglin Lai, Mengyao Liao, Dong Xu, Zebin Zhao, Zhihang Yuan, Chao Fan, Jianqiang Li, Bingzhe Wu

    Abstract: Large language models based on Mixture-of-Experts have achieved substantial gains in efficiency and scalability, yet their architectural uniqueness introduces underexplored safety alignment challenges. Existing safety alignment strategies, predominantly designed for dense models, are ill-suited to address MoE-specific vulnerabilities. In this work, we formalize and systematically study MoE model's… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: 9 pages, 7 figures

  9. arXiv:2506.16112  [pdf, ps, other

    cs.CV

    AutoV: Learning to Retrieve Visual Prompt for Large Vision-Language Models

    Authors: Yuan Zhang, Chun-Kai Fan, Tao Huang, Ming Lu, Sicheng Yu, Junwen Pan, Kuan Cheng, Qi She, Shanghang Zhang

    Abstract: Inspired by text prompts in large language models (LLMs), visual prompts have been explored to enhance the reasoning capabilities of large vision-language models (LVLMs). Current methods design heuristic visual prompts, such as overlaying a text-query-guided attention heatmap on the original input image. However, designing effective prompts manually is challenging and time-consuming, and it often… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 19 pages

  10. arXiv:2506.16064  [pdf, ps, other

    cs.CL

    Self-Critique-Guided Curiosity Refinement: Enhancing Honesty and Helpfulness in Large Language Models via In-Context Learning

    Authors: Duc Hieu Ho, Chenglin Fan

    Abstract: Large language models (LLMs) have demonstrated robust capabilities across various natural language tasks. However, producing outputs that are consistently honest and helpful remains an open challenge. To overcome this challenge, this paper tackles the problem through two complementary directions. It conducts a comprehensive benchmark evaluation of ten widely used large language models, including b… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  11. arXiv:2506.13533  [pdf, ps, other

    cs.LG cs.DS

    Learning Augmented Graph $k$-Clustering

    Authors: Chenglin Fan, Kijun Shin

    Abstract: Clustering is a fundamental task in unsupervised learning. Previous research has focused on learning-augmented $k$-means in Euclidean metrics, limiting its applicability to complex data representations. In this paper, we generalize learning-augmented $k$-clustering to operate on general metrics, enabling its application to graph-structured and non-Euclidean domains. Our framework also relaxes rest… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  12. arXiv:2506.12963  [pdf, ps, other

    cs.AI cs.LG

    Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills

    Authors: Changsheng Wang, Chongyu Fan, Yihua Zhang, Jinghan Jia, Dennis Wei, Parikshit Ram, Nathalie Baracaldo, Sijia Liu

    Abstract: Recent advances in large reasoning models (LRMs) have enabled strong chain-of-thought (CoT) generation through test-time computation. While these multi-step reasoning capabilities represent a major milestone in language model performance, they also introduce new safety risks. In this work, we present the first systematic study to revisit the problem of machine unlearning in the context of LRMs. Ma… ▽ More

    Submitted 15 June, 2025; originally announced June 2025.

  13. arXiv:2506.11077  [pdf, ps, other

    cs.CL

    CyclicReflex: Improving Large Reasoning Models via Cyclical Reflection Token Scheduling

    Authors: Chongyu Fan, Yihua Zhang, Jinghan Jia, Alfred Hero, Sijia Liu

    Abstract: Large reasoning models (LRMs), such as OpenAI's o1 and DeepSeek-R1, harness test-time scaling to perform multi-step reasoning for complex problem-solving. This reasoning process, executed before producing final answers, is often guided by special juncture tokens or textual segments that prompt self-evaluative reflection. We refer to these transition markers and reflective cues as "reflection token… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  14. arXiv:2506.11046  [pdf, other

    cs.LG

    The Effects of Data Augmentation on Confidence Estimation for LLMs

    Authors: Rui Wang, Renyu Zhu, Minmin Lin, Runze Wu, Tangjie Lv, Changjie Fan, Haobo Wang

    Abstract: Confidence estimation is crucial for reflecting the reliability of large language models (LLMs), particularly in the widely used closed-source models. Utilizing data augmentation for confidence estimation is viable, but discussions focus on specific augmentation techniques, limiting its potential. We study the impact of different data augmentation methods on confidence estimation. Our findings ind… ▽ More

    Submitted 21 May, 2025; originally announced June 2025.

  15. arXiv:2506.08337  [pdf, ps, other

    cs.LG stat.ML

    A Simple Analysis of Discretization Error in Diffusion Models

    Authors: Juhyeok Choi, Chenglin Fan

    Abstract: Diffusion models, formulated as discretizations of stochastic differential equations (SDEs), achieve state-of-the-art generative performance. However, existing analyses of their discretization error often rely on complex probabilistic tools. In this work, we present a simplified theoretical framework for analyzing the Euler--Maruyama discretization of variance-preserving SDEs (VP-SDEs) in Denoisin… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  16. arXiv:2506.07454  [pdf, ps, other

    cs.RO cs.AI

    Language-Grounded Hierarchical Planning and Execution with Multi-Robot 3D Scene Graphs

    Authors: Jared Strader, Aaron Ray, Jacob Arkin, Mason B. Peterson, Yun Chang, Nathan Hughes, Christopher Bradley, Yi Xuan Jia, Carlos Nieto-Granda, Rajat Talak, Chuchu Fan, Luca Carlone, Jonathan P. How, Nicholas Roy

    Abstract: In this paper, we introduce a multi-robot system that integrates mapping, localization, and task and motion planning (TAMP) enabled by 3D scene graphs to execute complex instructions expressed in natural language. Our system builds a shared 3D scene graph incorporating an open-set object-based map, which is leveraged for multi-robot 3D scene graph fusion. This representation supports real-time, vi… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: 12 pages, 4 figures

  17. arXiv:2506.05281  [pdf, ps, other

    cs.LG cs.AI

    Fast-DataShapley: Neural Modeling for Training Data Valuation

    Authors: Haifeng Sun, Yu Xiong, Runze Wu, Xinyu Cai, Changjie Fan, Lan Zhang, Xiang-Yang Li

    Abstract: The value and copyright of training data are crucial in the artificial intelligence industry. Service platforms should protect data providers' legitimate rights and fairly reward them for their contributions. Shapley value, a potent tool for evaluating contributions, outperforms other methods in theory, but its computational overhead escalates exponentially with the number of data providers. Recen… ▽ More

    Submitted 12 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  18. arXiv:2506.04699  [pdf, ps, other

    cs.AI

    Empowering Economic Simulation for Massively Multiplayer Online Games through Generative Agent-Based Modeling

    Authors: Bihan Xu, Shiwei Zhao, Runze Wu, Zhenya Huang, Jiawei Wang, Zhipeng Hu, Kai Wang, Haoyu Liu, Tangjie Lv, Le Li, Changjie Fan, Xin Tong, Jiangze Han

    Abstract: Within the domain of Massively Multiplayer Online (MMO) economy research, Agent-Based Modeling (ABM) has emerged as a robust tool for analyzing game economics, evolving from rule-based agents to decision-making agents enhanced by reinforcement learning. Nevertheless, existing works encounter significant challenges when attempting to emulate human-like economic activities among agents, particularly… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: KDD2025 Accepted

  19. arXiv:2506.04602  [pdf, ps, other

    cs.GT cs.LG

    MVP-Shapley: Feature-based Modeling for Evaluating the Most Valuable Player in Basketball

    Authors: Haifeng Sun, Yu Xiong, Runze Wu, Kai Wang, Lan Zhang, Changjie Fan, Shaojie Tang, Xiang-Yang Li

    Abstract: The burgeoning growth of the esports and multiplayer online gaming community has highlighted the critical importance of evaluating the Most Valuable Player (MVP). The establishment of an explainable and practical MVP evaluation method is very challenging. In our study, we specifically focus on play-by-play data, which records related events during the game, such as assists and points. We aim to ad… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  20. arXiv:2506.04205  [pdf, ps, other

    cs.LG

    EPiC: Towards Lossless Speedup for Reasoning Training through Edge-Preserving CoT Condensation

    Authors: Jinghan Jia, Hadi Reisizadeh, Chongyu Fan, Nathalie Baracaldo, Mingyi Hong, Sijia Liu

    Abstract: Large language models (LLMs) have shown remarkable reasoning capabilities when trained with chain-of-thought (CoT) supervision. However, the long and verbose CoT traces, especially those distilled from large reasoning models (LRMs) such as DeepSeek-R1, significantly increase training costs during the distillation process, where a non-reasoning base model is taught to replicate the reasoning behavi… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  21. arXiv:2506.02594  [pdf, ps, other

    cs.AI

    EALG: Evolutionary Adversarial Generation of Language Model-Guided Generators for Combinatorial Optimization

    Authors: Ruibo Duan, Yuxin Liu, Xinyao Dong, Chenglin Fan

    Abstract: Generating challenging instances is crucial for the evaluation and advancement of combinatorial optimization solvers. In this work, we introduce EALG (Evolutionary Adversarial Generation of Language Model-Guided Generators), a novel framework that automates the co-evolution of optimization problem instances and their corresponding heuristic solvers using large language models (LLMs). EALG leverage… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  22. arXiv:2506.00466  [pdf, ps, other

    eess.AS cs.SD

    M3ANet: Multi-scale and Multi-Modal Alignment Network for Brain-Assisted Target Speaker Extraction

    Authors: Cunhang Fan, Ying Chen, Jian Zhou, Zexu Pan, Jingjing Zhang, Youdian Gao, Xiaoke Yang, Zhengqi Wen, Zhao Lv

    Abstract: The brain-assisted target speaker extraction (TSE) aims to extract the attended speech from mixed speech by utilizing the brain neural activities, for example Electroencephalography (EEG). However, existing models overlook the issue of temporal misalignment between speech and EEG modalities, which hampers TSE performance. In addition, the speech encoder in current models typically uses basic tempo… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    Comments: Accepted to IJCAI 2025

  23. arXiv:2506.00429  [pdf, ps, other

    math.CO cs.IT

    Combinatorial $t$-Designs from Finite Abelian Groups and Their Applications to Elliptic Curve Codes

    Authors: Hengfeng Liu, Chunming Tang, Cuiling Fan, Rong Luo

    Abstract: In this paper, we establish the conditions for some finite abelian groups and the family all the $k$-sets in each of them summing up to an element $x$ to form $t$-designs. We fully characterize the sufficient and necessary conditions for the incidence structures to form $1$-designs in finite abelian $p$-groups, generalizing existing results on vector spaces over finite fields. For finite abelian g… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

    MSC Class: 05B05; 94B05

  24. arXiv:2506.00375  [pdf, ps, other

    cs.SD eess.AS

    RPRA-ADD: Forgery Trace Enhancement-Driven Audio Deepfake Detection

    Authors: Ruibo Fu, Xiaopeng Wang, Zhengqi Wen, Jianhua Tao, Yuankun Xie, Zhiyong Wang, Chunyu Qiang, Xuefei Liu, Cunhang Fan, Chenxing Li, Guanjun Li

    Abstract: Existing methods for deepfake audio detection have demonstrated some effectiveness. However, they still face challenges in generalizing to new forgery techniques and evolving attack patterns. This limitation mainly arises because the models rely heavily on the distribution of the training data and fail to learn a decision boundary that captures the essential characteristics of forgeries. Additiona… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  25. arXiv:2505.24156  [pdf, ps, other

    cs.CV cs.RO

    Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction

    Authors: Chenyou Fan, Fangzheng Yan, Chenjia Bai, Jiepeng Wang, Chi Zhang, Zhen Wang, Xuelong Li

    Abstract: Learning a generalizable bimanual manipulation policy is extremely challenging for embodied agents due to the large action space and the need for coordinated arm movements. Existing approaches rely on Vision-Language-Action (VLA) models to acquire bimanual policies. However, transferring knowledge from single-arm datasets or pre-trained VLA models often fails to generalize effectively, primarily d… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  26. arXiv:2505.21939  [pdf, ps, other

    cs.DS

    Improved Approximation Algorithms for Chromatic and Pseudometric-Weighted Correlation Clustering

    Authors: Dahoon Lee, Chenglin Fan, Euiwoong Lee

    Abstract: Correlation Clustering (CC) is a foundational problem in unsupervised learning that models binary similarity relations using labeled graphs. While classical CC has been widely studied, many real-world applications involve more nuanced relationships, either multi-class categorical interactions or varying confidence levels in edge labels. To address these, two natural generalizations have been propo… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  27. arXiv:2505.21668  [pdf, other

    cs.AI cs.CL cs.SC

    R1-Code-Interpreter: Training LLMs to Reason with Code via Supervised and Reinforcement Learning

    Authors: Yongchao Chen, Yueying Liu, Junwei Zhou, Yilun Hao, Jingquan Wang, Yang Zhang, Chuchu Fan

    Abstract: Despite advances in reasoning and planning of R1-like models, Large Language Models (LLMs) still struggle with tasks requiring precise computation, symbolic manipulation, optimization, and algorithmic reasoning, in which textual reasoning lacks the rigor of code execution. A key challenge is enabling LLMs to decide when to use textual reasoning versus code generation. While OpenAI trains models to… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: 33 pages, 8 figures

  28. arXiv:2505.20573  [pdf, ps, other

    cs.RO cs.AI

    Collision- and Reachability-Aware Multi-Robot Control with Grounded LLM Planners

    Authors: Jiabao Ji, Yongchao Chen, Yang Zhang, Ramana Rao Kompella, Chuchu Fan, Gaowen Liu, Shiyu Chang

    Abstract: Large language models (LLMs) have demonstrated strong performance in various robot control tasks. However, their deployment in real-world applications remains constrained. Even state-ofthe-art LLMs, such as GPT-o4mini, frequently produce invalid action plans that violate physical constraints, such as directing a robot to an unreachable location or causing collisions between robots. This issue prim… ▽ More

    Submitted 3 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  29. arXiv:2505.20218  [pdf, other

    cs.LG

    Fine-grained List-wise Alignment for Generative Medication Recommendation

    Authors: Chenxiao Fan, Chongming Gao, Wentao Shi, Yaxin Gong, Zihao Zhao, Fuli Feng

    Abstract: Accurate and safe medication recommendations are critical for effective clinical decision-making, especially in multimorbidity cases. However, existing systems rely on point-wise prediction paradigms that overlook synergistic drug effects and potential adverse drug-drug interactions (DDIs). We propose FLAME, a fine-grained list-wise alignment framework for large language models (LLMs), enabling dr… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  30. arXiv:2505.18582  [pdf, ps, other

    cs.CV cs.AI

    On Denoising Walking Videos for Gait Recognition

    Authors: Dongyang Jin, Chao Fan, Jingzhe Ma, Jingkai Zhou, Weihua Chen, Shiqi Yu

    Abstract: To capture individual gait patterns, excluding identity-irrelevant cues in walking videos, such as clothing texture and color, remains a persistent challenge for vision-based gait recognition. Traditional silhouette- and pose-based methods, though theoretically effective at removing such distractions, often fall short of high accuracy due to their sparse and less informative inputs. Emerging end-t… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: 8pages, 4 figures

  31. arXiv:2505.18132  [pdf, ps, other

    cs.CV

    BiggerGait: Unlocking Gait Recognition with Layer-wise Representations from Large Vision Models

    Authors: Dingqiang Ye, Chao Fan, Zhanbo Huang, Chengwen Luo, Jianqiang Li, Shiqi Yu, Xiaoming Liu

    Abstract: Large vision models (LVM) based gait recognition has achieved impressive performance. However, existing LVM-based approaches may overemphasize gait priors while neglecting the intrinsic value of LVM itself, particularly the rich, distinct representations across its multi-layers. To adequately unlock LVM's potential, this work investigates the impact of layer-wise representations on downstream reco… ▽ More

    Submitted 17 June, 2025; v1 submitted 23 May, 2025; originally announced May 2025.

  32. arXiv:2505.16246  [pdf, ps, other

    cs.CR

    Verifying Differentially Private Median Estimation

    Authors: Hyukjun Kwon, Chenglin Fan

    Abstract: Differential Privacy (DP) is a robust privacy guarantee that is widely employed in private data analysis today, finding broad application in domains such as statistical query release and machine learning. However, DP achieves privacy by introducing noise into data or query answers, which malicious actors could exploit during analysis. To address this concern, we propose the first verifiable differ… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: 22 pages

  33. arXiv:2505.15364  [pdf, ps, other

    cs.HC cs.SD eess.AS

    MHANet: Multi-scale Hybrid Attention Network for Auditory Attention Detection

    Authors: Lu Li, Cunhang Fan, Hongyu Zhang, Jingjing Zhang, Xiaoke Yang, Jian Zhou, Zhao Lv

    Abstract: Auditory attention detection (AAD) aims to detect the target speaker in a multi-talker environment from brain signals, such as electroencephalography (EEG), which has made great progress. However, most AAD methods solely utilize attention mechanisms sequentially and overlook valuable multi-scale contextual information within EEG signals, limiting their ability to capture long-short range spatiotem… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  34. arXiv:2505.11838  [pdf, ps, other

    cs.CV

    RVTBench: A Benchmark for Visual Reasoning Tasks

    Authors: Yiqing Shen, Chenjia Li, Chenxiao Fan, Mathias Unberath

    Abstract: Visual reasoning, the capability to interpret visual input in response to implicit text query through multi-step reasoning, remains a challenge for deep learning models due to the lack of relevant benchmarks. Previous work in visual reasoning has primarily focused on reasoning segmentation, where models aim to segment objects based on implicit text queries. This paper introduces reasoning visual t… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  35. arXiv:2505.10348  [pdf, ps, other

    cs.HC cs.SD eess.AS

    ListenNet: A Lightweight Spatio-Temporal Enhancement Nested Network for Auditory Attention Detection

    Authors: Cunhang Fan, Xiaoke Yang, Hongyu Zhang, Ying Chen, Lu Li, Jian Zhou, Zhao Lv

    Abstract: Auditory attention detection (AAD) aims to identify the direction of the attended speaker in multi-speaker environments from brain signals, such as Electroencephalography (EEG) signals. However, existing EEG-based AAD methods overlook the spatio-temporal dependencies of EEG signals, limiting their decoding and generalization abilities. To address these issues, this paper proposes a Lightweight Spa… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  36. arXiv:2505.02152  [pdf, other

    cs.RO

    Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions

    Authors: Cunxin Fan, Xiaosong Jia, Yihang Sun, Yixiao Wang, Jianglan Wei, Ziyang Gong, Xiangyu Zhao, Masayoshi Tomizuka, Xue Yang, Junchi Yan, Mingyu Ding

    Abstract: Vision-Language-Action (VLA) models have shown great promise for generalist robotic manipulation in the physical world. However, existing models are restricted to robot observations and text-only instructions, lacking the flexibility of interleaved multimodal instructions enabled by recent advances in foundation models in the digital world. In this paper, we present Interleave-VLA, the first frame… ▽ More

    Submitted 4 May, 2025; originally announced May 2025.

  37. arXiv:2505.01656  [pdf, other

    cs.CV

    A Novel WaveInst-based Network for Tree Trunk Structure Extraction and Pattern Analysis in Forest Inventory

    Authors: Chenyang Fan, Xujie Zhu, Taige Luo, Sheng Xu, Zhulin Chen, Hongxin Yang

    Abstract: The pattern analysis of tree structure holds significant scientific value for genetic breeding and forestry management. The current trunk and branch extraction technologies are mainly LiDAR-based or UAV-based. The former approaches obtain high-precision 3D data, but its equipment cost is high and the three-dimensional (3D) data processing is complex. The latter approaches efficiently capture canop… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  38. arXiv:2505.00562  [pdf, other

    cs.RO cs.AI cs.FL cs.LG

    TeLoGraF: Temporal Logic Planning via Graph-encoded Flow Matching

    Authors: Yue Meng, Chuchu Fan

    Abstract: Learning to solve complex tasks with signal temporal logic (STL) specifications is crucial to many real-world applications. However, most previous works only consider fixed or parametrized STL specifications due to the lack of a diverse STL dataset and encoders to effectively extract temporal logic information for downstream tasks. In this paper, we propose TeLoGraF, Temporal Logic Graph-encoded F… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: Accepted to ICML2025

  39. An Empirical Analysis of Compatibility Issues for Industrial Mobile Games

    Authors: Zihe Song, Yingfeng Chen, Lei Ma, Shangjie Lu, Honglei Lin, Changjie Fan, Wei Yang

    Abstract: Detecting and fixing compatibility issues is critical for mobile game development. The rapid evolution of mobile operating systems and device fragmentation make it challenging for developers to timely address these issues across diverse models. Undetected compatibility problems can severely impact user experience and cause financial loss to companies and players. However, mobile game testing remai… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: Accepted at ISSRE 2022

  40. arXiv:2504.15425  [pdf, other

    cs.RO cs.AI cs.LG cs.MA math.OC

    Solving Multi-Agent Safe Optimal Control with Distributed Epigraph Form MARL

    Authors: Songyuan Zhang, Oswin So, Mitchell Black, Zachary Serlin, Chuchu Fan

    Abstract: Tasks for multi-robot systems often require the robots to collaborate and complete a team goal while maintaining safety. This problem is usually formalized as a constrained Markov decision process (CMDP), which targets minimizing a global cost and bringing the mean of constraint violation below a user-defined threshold. Inspired by real-world robotic applications, we define safety as zero constrai… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 28 pages, 16 figures; Accepted by Robotics: Science and Systems 2025

  41. arXiv:2504.14977  [pdf, other

    cs.CV

    RealisDance-DiT: Simple yet Strong Baseline towards Controllable Character Animation in the Wild

    Authors: Jingkai Zhou, Yifan Wu, Shikai Li, Min Wei, Chao Fan, Weihua Chen, Wei Jiang, Fan Wang

    Abstract: Controllable character animation remains a challenging problem, particularly in handling rare poses, stylized characters, character-object interactions, complex illumination, and dynamic scenes. To tackle these issues, prior work has largely focused on injecting pose and appearance guidance via elaborate bypass networks, but often struggles to generalize to open-world scenarios. In this paper, we… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Project Page: https://thefoxofsky.github.io/project_pages_new/RealisDance-DiT/index

  42. arXiv:2504.12737  [pdf, other

    cs.CL

    Chinese-Vicuna: A Chinese Instruction-following Llama-based Model

    Authors: Chenghao Fan, Zhenyi Lu, Jie Tian

    Abstract: Chinese-Vicuna is an open-source, resource-efficient language model designed to bridge the gap in Chinese instruction-following capabilities by fine-tuning Meta's LLaMA architecture using Low-Rank Adaptation (LoRA). Targeting low-resource environments, it enables cost-effective deployment on consumer GPUs (e.g., RTX-2080Ti for 7B models) and supports domain-specific adaptation in fields like healt… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Chinese-Vicuna Technique Report

  43. arXiv:2504.10514  [pdf, ps, other

    cs.CV cs.AI cs.CL cs.LG

    ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

    Authors: Yijun Liang, Ming Li, Chenrui Fan, Ziyue Li, Dang Nguyen, Kwesi Cobbina, Shweta Bhardwaj, Jiuhai Chen, Fuxiao Liu, Tianyi Zhou

    Abstract: Color plays an important role in human perception and usually provides critical clues in visual reasoning. However, it is unclear whether and how vision-language models (VLMs) can perceive, understand, and leverage color as humans. This paper introduces ColorBench, an innovative benchmark meticulously crafted to assess the capabilities of VLMs in color understanding, including color perception, re… ▽ More

    Submitted 12 June, 2025; v1 submitted 10 April, 2025; originally announced April 2025.

    Comments: 36 pages, including references and appendix. Code is available at https://github.com/tianyi-lab/ColorBench

  44. arXiv:2504.06514  [pdf, other

    cs.AI cs.CL cs.LG

    Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

    Authors: Chenrui Fan, Ming Li, Lichao Sun, Tianyi Zhou

    Abstract: We find that the response length of reasoning LLMs, whether trained by reinforcement learning or supervised learning, drastically increases for ill-posed questions with missing premises (MiP), ending up with redundant and ineffective thinking. This newly introduced scenario exacerbates the general overthinking issue to a large extent, which we name as the MiP-Overthinking. Such failures are agains… ▽ More

    Submitted 10 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

  45. arXiv:2504.03354  [pdf, ps, other

    cs.DS

    A Generalized Binary Tree Mechanism for Differentially Private Approximation of All-Pair Distances

    Authors: Michael Dinitz, Chenglin Fan, Jingcheng Liu, Jalaj Upadhyay, Zongrui Zou

    Abstract: We study the problem of approximating all-pair distances in a weighted undirected graph with differential privacy, introduced by Sealfon [Sea16]. Given a publicly known undirected graph, we treat the weights of edges as sensitive information, and two graphs are neighbors if their edge weights differ in one edge by at most one. We obtain efficient algorithms with significantly improved bounds on a… ▽ More

    Submitted 4 April, 2025; originally announced April 2025.

    Comments: 27 pages

  46. arXiv:2504.03015  [pdf, other

    cs.RO

    AuDeRe: Automated Strategy Decision and Realization in Robot Planning and Control via LLMs

    Authors: Yue Meng, Fei Chen, Yongchao Chen, Chuchu Fan

    Abstract: Recent advancements in large language models (LLMs) have shown significant promise in various domains, especially robotics. However, most prior LLM-based work in robotic applications either directly predicts waypoints or applies LLMs within fixed tool integration frameworks, offering limited flexibility in exploring and configuring solutions best suited to different tasks. In this work, we propose… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 8 pages, 14 figures, submitted for CDC 2025 invited session on Large Language Models (LLMs) and Control

  47. arXiv:2503.22057  [pdf, other

    cs.CE

    A production planning benchmark for real-world refinery-petrochemical complexes

    Authors: Wenli Du, Chuan Wang, Chen Fan, Zhi Li, Yeke Zhong, Tianao Kang, Ziting Liang, Minglei Yang, Feng Qian, Xin Dai

    Abstract: To achieve digital intelligence transformation and carbon neutrality, effective production planning is crucial for integrated refinery-petrochemical complexes. Modern refinery planning relies on advanced optimization techniques, whose development requires reproducible benchmark problems. However, existing benchmarks lack practical context or impose oversimplified assumptions, limiting their applic… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  48. arXiv:2503.17340  [pdf, other

    cs.MM cs.AI cs.CV cs.SD eess.AS

    Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation

    Authors: Congyi Fan, Jian Guan, Xuanjia Zhao, Dongli Xu, Youtian Lin, Tong Ye, Pengming Feng, Haiwei Pan

    Abstract: Automatically generating natural, diverse and rhythmic human dance movements driven by music is vital for virtual reality and film industries. However, generating dance that naturally follows music remains a challenge, as existing methods lack proper beat alignment and exhibit unnatural motion dynamics. In this paper, we propose Danceba, a novel framework that leverages gating mechanism to enhance… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: 10 pages, 6 figures

  49. arXiv:2503.08179  [pdf, other

    q-bio.BM cs.AI

    ProtTeX: Structure-In-Context Reasoning and Editing of Proteins with Large Language Models

    Authors: Zicheng Ma, Chuanliu Fan, Zhicong Wang, Zhenyu Chen, Xiaohan Lin, Yanheng Li, Shihao Feng, Jun Zhang, Ziqiang Cao, Yi Qin Gao

    Abstract: Large language models have made remarkable progress in the field of molecular science, particularly in understanding and generating functional small molecules. This success is largely attributed to the effectiveness of molecular tokenization strategies. In protein science, the amino acid sequence serves as the sole tokenizer for LLMs. However, many fundamental challenges in protein science are inh… ▽ More

    Submitted 13 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: 26 pages, 9 figures

  50. arXiv:2503.08120  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Uni$\textbf{F}^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models

    Authors: Junzhe Li, Xuerui Qiu, Linrui Xu, Liya Guo, Delin Qu, Tingting Long, Chun Fan, Ming Li

    Abstract: Unified multimodal models (UMMs) have emerged as a powerful paradigm in foundational computer vision research, demonstrating significant potential in both image understanding and generation. However, existing research in the face domain primarily focuses on $\textbf{coarse}$ facial attribute understanding, with limited capacity to handle $\textbf{fine-grained}$ facial attributes and without addres… ▽ More

    Submitted 25 March, 2025; v1 submitted 11 March, 2025; originally announced March 2025.