Skip to main content

Showing 1–50 of 407 results for author: He, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10020  [pdf, ps, other

    eess.SY cs.RO

    Threshold Strategy for Leaking Corner-Free Hamilton-Jacobi Reachability with Decomposed Computations

    Authors: Chong He, Mugilan Mariappan, Keval Vora, Mo Chen

    Abstract: Hamilton-Jacobi (HJ) Reachability is widely used to compute value functions for states satisfying specific control objectives. However, it becomes intractable for high-dimensional problems due to the curse of dimensionality. Dimensionality reduction approaches are essential for mitigating this challenge, whereas they could introduce the ``leaking corner issue", leading to inaccuracies in the resul… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 7 pages, Submitted to Conference on Decision and Control (CDC)

  2. arXiv:2505.09144  [pdf, ps, other

    cs.RO

    Latent Theory of Mind: A Decentralized Diffusion Architecture for Cooperative Manipulation

    Authors: Chengyang He, Gadiel Sznaier Camps, Xu Liu, Mac Schwager, Guillaume Sartoretti

    Abstract: We present Latent Theory of Mind (LatentToM), a decentralized diffusion policy architecture for collaborative robot manipulation. Our policy allows multiple manipulators with their own perception and computation to collaborate with each other towards a common task goal with or without explicit communication. Our key innovation lies in allowing each agent to maintain two latent representations: an… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  3. arXiv:2505.08686  [pdf, ps, other

    cs.GR cs.CV cs.LG

    CAD-Coder:Text-Guided CAD Files Code Generation

    Authors: Changqi He, Shuhan Zhang, Liguo Zhang, Jiajun Miao

    Abstract: Computer-aided design (CAD) is a way to digitally create 2D drawings and 3D models of real-world products. Traditional CAD typically relies on hand-drawing by experts or modifications of existing library files, which doesn't allow for rapid personalization. With the emergence of generative artificial intelligence, convenient and efficient personalized CAD generation has become possible. However, e… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Report number: ICCV 2025 Submission 11025

  4. arXiv:2505.07608  [pdf, ps, other

    cs.CL cs.AI cs.LG

    MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

    Authors: Xiaomi LLM-Core Team, :, Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu, Jiebao Xiao, Jinhao Dong, Liang Zhao, Peidian Li, Peng Wang, Shihua Yu, Shimao Chen, Weikun Wang, Wenhan Ma, Xiangwei Deng, Yi Huang, Yifan Song, Zihan Jiang, Bowen Ye, Can Cai , et al. (40 additional authors not shown)

    Abstract: We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model's reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  5. arXiv:2505.07247  [pdf, other

    cs.CL cs.AI

    SAS-Bench: A Fine-Grained Benchmark for Evaluating Short Answer Scoring with Large Language Models

    Authors: Peichao Lai, Kexuan Zhang, Yi Lin, Linyihan Zhang, Feiyang Ye, Jinhao Yan, Yanwei Xu, Conghui He, Yilei Wang, Wentao Zhang, Bin Cui

    Abstract: Subjective Answer Grading (SAG) plays a crucial role in education, standardized testing, and automated assessment systems, particularly for evaluating short-form responses in Short Answer Scoring (SAS). However, existing approaches often produce coarse-grained scores and lack detailed reasoning. Although large language models (LLMs) have demonstrated potential as zero-shot evaluators, they remain… ▽ More

    Submitted 15 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

  6. arXiv:2505.06683  [pdf, other

    cs.CV

    UnfoldIR: Rethinking Deep Unfolding Network in Illumination Degradation Image Restoration

    Authors: Chunming He, Rihan Zhang, Fengyang Xiao, Chengyu Fang, Longxiang Tang, Yulun Zhang, Sina Farsiu

    Abstract: Deep unfolding networks (DUNs) are widely employed in illumination degradation image restoration (IDIR) to merge the interpretability of model-based approaches with the generalization of learning-based methods. However, the performance of DUN-based methods remains considerably inferior to that of state-of-the-art IDIR solvers. Our investigation indicates that this limitation does not stem from str… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

    Comments: 16 pages, 14 tables, 11 figures

  7. arXiv:2505.05787  [pdf, ps, other

    cs.RO

    Demystifying Diffusion Policies: Action Memorization and Simple Lookup Table Alternatives

    Authors: Chengyang He, Xu Liu, Gadiel Sznaier Camps, Guillaume Sartoretti, Mac Schwager

    Abstract: Diffusion policies have demonstrated remarkable dexterity and robustness in intricate, high-dimensional robot manipulation tasks, while training from a small number of demonstrations. However, the reason for this performance remains a mystery. In this paper, we offer a surprising hypothesis: diffusion policies essentially memorize an action lookup table -- and this is beneficial. We posit that, at… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  8. arXiv:2505.04088  [pdf

    cs.CV

    SMMT: Siamese Motion Mamba with Self-attention for Thermal Infrared Target Tracking

    Authors: Shang Zhang, Huanbin Zhang, Dali Feng, Yujie Cui, Ruoyan Xiong, Cen He

    Abstract: Thermal infrared (TIR) object tracking often suffers from challenges such as target occlusion, motion blur, and background clutter, which significantly degrade the performance of trackers. To address these issues, this paper pro-poses a novel Siamese Motion Mamba Tracker (SMMT), which integrates a bidirectional state-space model and a self-attention mechanism. Specifically, we introduce the Motion… ▽ More

    Submitted 10 May, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

  9. arXiv:2504.19093  [pdf, other

    cs.CR cs.AI cs.PF

    CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges

    Authors: Yu Li, Qizhi Pei, Mengyuan Sun, Honglin Lin, Chenlin Ming, Xin Gao, Jiang Wu, Conghui He, Lijun Wu

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities, especially the recent advancements in reasoning, such as o1 and o3, pushing the boundaries of AI. Despite these impressive achievements in mathematics and coding, the reasoning abilities of LLMs in domains requiring cryptographic expertise remain underexplored. In this paper, we introduce CipherBank, a comprehensive benchmark… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

    Comments: Work in progress

  10. arXiv:2504.16107  [pdf, other

    eess.SP cs.IT

    Phased Array Calibration based on Rotating-Element Harmonic Electric-Field Vector with Time Modulation

    Authors: Shiyuan Li, Yuyue Zhou, Chi Zhang, Liang Kong, Kebin Liu, Yihan Xie, Chong He

    Abstract: Calibration is crucial for ensuring the performance of phased array since amplitude-phase imbalance between elements results in significant performance degradation. While amplitude-only calibration methods offer advantages when phase measurements are impractical, conventional approaches face two key challenges: they typically require high-resolution phase shifters and remain susceptible to phase e… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  11. arXiv:2504.15146  [pdf, other

    cs.AI

    Behavioral Universe Network (BUN): A Behavioral Information-Based Framework for Complex Systems

    Authors: Wei Zhou, Ailiya Borjigin, Cong He

    Abstract: Modern digital ecosystems feature complex, dynamic interactions among autonomous entities across diverse domains. Traditional models often separate agents and objects, lacking a unified foundation to capture their interactive behaviors. This paper introduces the Behavioral Universe Network (BUN), a theoretical framework grounded in the Agent-Interaction-Behavior (AIB) formalism. BUN treats subject… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: 17 pages, 1 figure

  12. arXiv:2504.14600  [pdf, ps, other

    cs.CV

    NTIRE 2025 Challenge on Real-World Face Restoration: Methods and Results

    Authors: Zheng Chen, Jingkai Wang, Kai Liu, Jue Gong, Lei Sun, Zongwei Wu, Radu Timofte, Yulun Zhang, Jianxing Zhang, Jinlong Wu, Jun Wang, Zheng Xie, Hakjae Jeon, Suejin Han, Hyung-Ju Chun, Hyunhee Park, Zhicun Yin, Junjie Chen, Ming Liu, Xiaoming Li, Chao Zhou, Wangmeng Zuo, Weixia Zhang, Dingquan Li, Kede Ma , et al. (29 additional authors not shown)

    Abstract: This paper provides a review of the NTIRE 2025 challenge on real-world face restoration, highlighting the proposed solutions and the resulting outcomes. The challenge focuses on generating natural, realistic outputs while maintaining identity consistency. Its goal is to advance state-of-the-art solutions for perceptual quality and realism, without imposing constraints on computational resources or… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: NTIRE 2025 webpage: https://www.cvlai.net/ntire/2025. Code: https://github.com/zhengchen1999/NTIRE2025_RealWorld_Face_Restoration

  13. arXiv:2504.14194  [pdf, other

    cs.CL

    Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models

    Authors: Xinlin Zhuang, Jiahui Peng, Ren Ma, Yinfan Wang, Tianyi Bai, Xingjian Wei, Jiantao Qiu, Chi Zhang, Ying Qian, Conghui He

    Abstract: The composition of pre-training datasets for large language models (LLMs) remains largely undisclosed, hindering transparency and efforts to optimize data quality, a critical driver of model performance. Current data selection methods, such as natural language quality assessments, diversity-based filters, and classifier-based approaches, are limited by single-dimensional evaluation or redundancy-f… ▽ More

    Submitted 30 April, 2025; v1 submitted 19 April, 2025; originally announced April 2025.

    Comments: Under review

  14. arXiv:2504.12322  [pdf, other

    cs.CL cs.AI cs.LG

    A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis

    Authors: Xin Gao, Qizhi Pei, Zinan Tang, Yu Li, Honglin Lin, Jiang Wu, Lijun Wu, Conghui He

    Abstract: While data synthesis and distillation are promising strategies to enhance small language models, current approaches heavily rely on Large Language Models (LLMs), which suffer from high computational costs, environmental inefficiency, and potential biases inherited from monolithic architectures. In contrast, smaller LLMs are more accessible and sustainable, but their individual capabilities often f… ▽ More

    Submitted 21 April, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

  15. arXiv:2504.10519  [pdf, other

    cs.AI cs.CL cs.LG cs.MA

    Toward Super Agent System with Hybrid AI Routers

    Authors: Yuhang Yao, Haixin Wang, Yibo Chen, Jiawen Wang, Min Chang Jordan Ren, Bosheng Ding, Salman Avestimehr, Chaoyang He

    Abstract: AI Agents powered by Large Language Models are transforming the world through enormous applications. A super agent has the potential to fulfill diverse user needs, such as summarization, coding, and research, by accurately understanding user intent and leveraging the appropriate tools to solve tasks. However, to make such an agent viable for real-world deployment and accessible at scale, significa… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  16. arXiv:2504.10479  [pdf, other

    cs.CV

    InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

    Authors: Jinguo Zhu, Weiyun Wang, Zhe Chen, Zhaoyang Liu, Shenglong Ye, Lixin Gu, Hao Tian, Yuchen Duan, Weijie Su, Jie Shao, Zhangwei Gao, Erfei Cui, Xuehui Wang, Yue Cao, Yangzhou Liu, Xingguang Wei, Hongjie Zhang, Haomin Wang, Weiye Xu, Hao Li, Jiahao Wang, Nianchen Deng, Songze Li, Yinan He, Tan Jiang , et al. (26 additional authors not shown)

    Abstract: We introduce InternVL3, a significant advancement in the InternVL series featuring a native multimodal pre-training paradigm. Rather than adapting a text-only large language model (LLM) into a multimodal large language model (MLLM) that supports visual inputs, InternVL3 jointly acquires multimodal and linguistic capabilities from both diverse multimodal data and pure-text corpora during a single p… ▽ More

    Submitted 18 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

    Comments: Technical Report

  17. arXiv:2504.09925  [pdf, other

    cs.CV

    FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding

    Authors: Zheng Liu, Mengjie Liu, Jingzhou Chen, Jingwei Xu, Bin Cui, Conghui He, Wentao Zhang

    Abstract: We introduce FUSION, a family of multimodal large language models (MLLMs) with a fully vision-language alignment and integration paradigm. Unlike existing methods that primarily rely on late-stage modality interaction during LLM decoding, our approach achieves deep, dynamic integration throughout the entire processing pipeline. To this end, we propose Text-Guided Unified Vision Encoding, incorpora… ▽ More

    Submitted 19 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

  18. arXiv:2504.09567  [pdf, other

    stat.ML cs.LG stat.ME

    Conditional Independence Test Based on Transport Maps

    Authors: Chenxuan He, Yuan Gao, Liping Zhu, Jian Huang

    Abstract: Testing conditional independence between two random vectors given a third is a fundamental and challenging problem in statistics, particularly in multivariate nonparametric settings due to the complexity of conditional structures. We propose a novel framework for testing conditional independence using transport maps. At the population level, we show that two well-defined transport maps can transfo… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

    Comments: 35 pages

    MSC Class: 62G05; 62G08; 68T07

  19. arXiv:2504.05732  [pdf, other

    cs.CL

    LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources

    Authors: Haoyu Wang, Yujia Fu, Zhu Zhang, Shuo Wang, Zirui Ren, Xiaorong Wang, Zhili Li, Chaoqun He, Bo An, Zhiyuan Liu, Maosong Sun

    Abstract: Long-form generation is crucial for a wide range of practical applications, typically categorized into short-to-long and long-to-long generation. While short-to-long generations have received considerable attention, generating long texts from extremely long resources remains relatively underexplored. The primary challenge in long-to-long generation lies in effectively integrating and analyzing rel… ▽ More

    Submitted 14 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

  20. arXiv:2504.04577  [pdf, other

    math.OC cs.DM

    Minimum Cut Representability of Stable Matching Problems

    Authors: Yuri Faenza, Ayoub Foussoul, Chengyue He

    Abstract: We introduce and study Minimum Cut Representability, a framework to solve optimization and feasibility problems over stable matchings by representing them as minimum s-t cut problems on digraphs over rotations. We provide necessary and sufficient conditions on objective functions and feasibility sets for problems to be minimum cut representable. In particular, we define the concepts of first and s… ▽ More

    Submitted 6 April, 2025; originally announced April 2025.

  21. arXiv:2504.03976  [pdf, other

    q-bio.QM cs.AI q-bio.GN

    OLAF: An Open Life Science Analysis Framework for Conversational Bioinformatics Powered by Large Language Models

    Authors: Dylan Riffle, Nima Shirooni, Cody He, Manush Murali, Sovit Nayak, Rishikumar Gopalan, Diego Gonzalez Lopez

    Abstract: OLAF (Open Life Science Analysis Framework) is an open-source platform that enables researchers to perform bioinformatics analyses using natural language. By combining large language models (LLMs) with a modular agent-pipe-router architecture, OLAF generates and executes bioinformatics code on real scientific data, including formats like .h5ad. The system includes an Angular front end and a Python… ▽ More

    Submitted 10 April, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

  22. Task as Context Prompting for Accurate Medical Symptom Coding Using Large Language Models

    Authors: Chengyang He, Wenlong Zhang, Violet Xinying Chen, Yue Ning, Ping Wang

    Abstract: Accurate medical symptom coding from unstructured clinical text, such as vaccine safety reports, is a critical task with applications in pharmacovigilance and safety monitoring. Symptom coding, as tailored in this study, involves identifying and linking nuanced symptom mentions to standardized vocabularies like MedDRA, differentiating it from broader medical coding tasks. Traditional approaches to… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 11 pages, 5 figures, 5 Tables, ACM/IEEE International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE '25), June 24--26, 2025, New York, NY, USA

  23. arXiv:2504.02782  [pdf, other

    cs.CV

    GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation

    Authors: Zhiyuan Yan, Junyan Ye, Weijia Li, Zilong Huang, Shenghai Yuan, Xiangyang He, Kaiqing Lin, Jun He, Conghui He, Li Yuan

    Abstract: The recent breakthroughs in OpenAI's GPT4o model have demonstrated surprisingly good capabilities in image generation and editing, resulting in significant excitement in the community. This technical report presents the first-look evaluation benchmark (named GPT-ImgEval), quantitatively and qualitatively diagnosing GPT-4o's performance across three critical dimensions: (1) generation quality, (2)… ▽ More

    Submitted 2 May, 2025; v1 submitted 3 April, 2025; originally announced April 2025.

  24. arXiv:2504.00115  [pdf

    cs.RO eess.SY

    SACA: A Scenario-Aware Collision Avoidance Framework for Autonomous Vehicles Integrating LLMs-Driven Reasoning

    Authors: Shiyue Zhao, Junzhi Zhang, Neda Masoud, Heye Huang, Xingpeng Xia, Chengkun He

    Abstract: Reliable collision avoidance under extreme situations remains a critical challenge for autonomous vehicles. While large language models (LLMs) offer promising reasoning capabilities, their application in safety-critical evasive maneuvers is limited by latency and robustness issues. Even so, LLMs stand out for their ability to weigh emotional, legal, and ethical factors, enabling socially responsib… ▽ More

    Submitted 31 March, 2025; originally announced April 2025.

    Comments: 10 pages,10 figures. This work has been submitted to the IEEE Robotics and Automation Letters (RAL) for possible publication

  25. arXiv:2503.21758  [pdf, other

    cs.CV

    Lumina-Image 2.0: A Unified and Efficient Image Generative Framework

    Authors: Qi Qin, Le Zhuo, Yi Xin, Ruoyi Du, Zhen Li, Bin Fu, Yiting Lu, Jiakang Yuan, Xinyue Li, Dongyang Liu, Xiangyang Zhu, Manyuan Zhang, Will Beddow, Erwann Millon, Victor Perez, Wenhai Wang, Conghui He, Bo Zhang, Xiaohong Liu, Hongsheng Li, Yu Qiao, Chang Xu, Peng Gao

    Abstract: We introduce Lumina-Image 2.0, an advanced text-to-image generation framework that achieves significant progress compared to previous work, Lumina-Next. Lumina-Image 2.0 is built upon two key principles: (1) Unification - it adopts a unified architecture (Unified Next-DiT) that treats text and image tokens as a joint sequence, enabling natural cross-modal interactions and allowing seamless task ex… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Tech Report, 21 pages, 12 figures

  26. arXiv:2503.21500  [pdf, other

    cs.CL

    OpenHuEval: Evaluating Large Language Model on Hungarian Specifics

    Authors: Haote Yang, Xingjian Wei, Jiang Wu, Noémi Ligeti-Nagy, Jiaxing Sun, Yinfan Wang, Zijian Győző Yang, Junyuan Gao, Jingchao Wang, Bowen Jiang, Shasha Wang, Nanjun Yu, Zihao Zhang, Shixin Hong, Hongwei Liu, Wei Li, Songyang Zhang, Dahua Lin, Lijun Wu, Gábor Prószéky, Conghui He

    Abstract: We introduce OpenHuEval, the first benchmark for LLMs focusing on the Hungarian language and specifics. OpenHuEval is constructed from a vast collection of Hungarian-specific materials sourced from multiple origins. In the construction, we incorporated the latest design principles for evaluating LLMs, such as using real user queries from the internet, emphasizing the assessment of LLMs' generative… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

  27. arXiv:2503.19448  [pdf, other

    cs.CV eess.IV

    Towards Robust Time-of-Flight Depth Denoising with Confidence-Aware Diffusion Model

    Authors: Changyong He, Jin Zeng, Jiawei Zhang, Jiajie Guo

    Abstract: Time-of-Flight (ToF) sensors efficiently capture scene depth, but the nonlinear depth construction procedure often results in extremely large noise variance or even invalid areas. Recent methods based on deep neural networks (DNNs) achieve enhanced ToF denoising accuracy but tend to struggle when presented with severe noise corruption due to limited prior knowledge of ToF data distribution. In thi… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  28. arXiv:2503.18484  [pdf, other

    cs.CV cs.CL

    PM4Bench: A Parallel Multilingual Multi-Modal Multi-task Benchmark for Large Vision Language Model

    Authors: Junyuan Gao, Jiahe Song, Jiang Wu, Runchuan Zhu, Guanlin Shen, Shasha Wang, Xingjian Wei, Haote Yang, Songyang Zhang, Weijia Li, Bin Wang, Dahua Lin, Lijun Wu, Conghui He

    Abstract: Existing multilingual benchmarks for Large Vision Language Models (LVLMs) suffer from limitations including language-specific content biases, disjointed multimodal input formats, and a lack of safety evaluation. To address these gaps, we propose PM4Bench, the first Parallel Multilingual Multi-Modal Multi-task Benchmark for LVLMs. PM4Bench features a parallel corpus design across 10 languages, enab… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: Equal contribution: Junyuan Gao, Jiahe Song, Jiang Wu; Corresponding author: Conghui He

  29. arXiv:2503.17439  [pdf, other

    cs.LG cs.AI

    LEMMA: Learning from Errors for MatheMatical Advancement in LLMs

    Authors: Zhuoshi Pan, Yu Li, Honglin Lin, Qizhi Pei, Zinan Tang, Wei Wu, Chenlin Ming, H. Vicky Zhao, Conghui He, Lijun Wu

    Abstract: Large language models (LLMs) have demonstrated remarkable reasoning capability in solving mathematical problems. However, existing approaches primarily focus on improving the quality of correct training data, e.g., distilling high-quality correct solutions from advanced models, neglecting the value contained in error data, potentially hindering the model's reflective ability. Though some studies a… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

    Comments: 9 pages, 6 figures, 4 tables, under review

  30. arXiv:2503.16212  [pdf, other

    cs.CL cs.AI

    MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion

    Authors: Qizhi Pei, Lijun Wu, Zhuoshi Pan, Yu Li, Honglin Lin, Chenlin Ming, Xin Gao, Conghui He, Rui Yan

    Abstract: Large Language Models (LLMs) have shown impressive progress in mathematical reasoning. While data augmentation is promising to enhance mathematical problem-solving ability, current approaches are predominantly limited to instance-level modifications-such as rephrasing or generating syntactic variations-which fail to capture and leverage the intrinsic relational structures inherent in mathematical… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

    Comments: Work in progress

  31. arXiv:2503.15264  [pdf, other

    cs.CV

    LEGION: Learning to Ground and Explain for Synthetic Image Detection

    Authors: Hengrui Kang, Siwei Wen, Zichen Wen, Junyan Ye, Weijia Li, Peilin Feng, Baichuan Zhou, Bin Wang, Dahua Lin, Linfeng Zhang, Conghui He

    Abstract: The rapid advancements in generative technology have emerged as a double-edged sword. While offering powerful tools that enhance convenience, they also pose significant social concerns. As defenders, current synthetic image detection methods often lack artifact-level textual interpretability and are overly focused on image manipulation detection, and current datasets usually suffer from outdated g… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: Project Page: https://opendatalab.github.io/LEGION

  32. arXiv:2503.14905  [pdf, other

    cs.CV

    Spot the Fake: Large Multimodal Model-Based Synthetic Image Detection with Artifact Explanation

    Authors: Siwei Wen, Junyan Ye, Peilin Feng, Hengrui Kang, Zichen Wen, Yize Chen, Jiang Wu, Wenjun Wu, Conghui He, Weijia Li

    Abstract: With the rapid advancement of Artificial Intelligence Generated Content (AIGC) technologies, synthetic images have become increasingly prevalent in everyday life, posing new challenges for authenticity assessment and detection. Despite the effectiveness of existing methods in evaluating image authenticity and locating forgeries, these approaches often lack human interpretability and do not fully a… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  33. arXiv:2503.14891  [pdf, other

    cs.CL cs.AI

    MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer

    Authors: Honglin Lin, Zhuoshi Pan, Yu Li, Qizhi Pei, Xin Gao, Mengzhang Cai, Conghui He, Lijun Wu

    Abstract: Large Language Models (LLMs) have demonstrated promising capabilities in solving mathematical reasoning tasks, leveraging Chain-of-Thought (CoT) data as a vital component in guiding answer generation. Current paradigms typically generate CoT and answers directly for a given problem, diverging from human problem-solving strategies to some extent. Humans often solve problems by recalling analogous c… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

  34. arXiv:2503.14554  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Synchronous vs Asynchronous Reinforcement Learning in a Real World Robot

    Authors: Ali Parsaee, Fahim Shahriar, Chuxin He, Ruiqing Tan

    Abstract: In recent times, reinforcement learning (RL) with physical robots has attracted the attention of a wide range of researchers. However, state-of-the-art RL algorithms do not consider that physical environments do not wait for the RL agent to make decisions or updates. RL agents learn by periodically conducting computationally expensive gradient updates. When decision-making and gradient update task… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

    Comments: Presented at Alberta Robotics & Intelligent Systems Expo (RISE) Conference

  35. arXiv:2503.09294  [pdf, other

    cs.CV

    IQPFR: An Image Quality Prior for Blind Face Restoration and Beyond

    Authors: Peng Hu, Chunming He, Lei Xu, Jingduo Tian, Sina Farsiu, Yulun Zhang, Pei Liu, Xiu Li

    Abstract: Blind Face Restoration (BFR) addresses the challenge of reconstructing degraded low-quality (LQ) facial images into high-quality (HQ) outputs. Conventional approaches predominantly rely on learning feature representations from ground-truth (GT) data; however, inherent imperfections in GT datasets constrain restoration performance to the mean quality level of the training data, rather than attainin… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  36. arXiv:2503.05933  [pdf, other

    eess.IV cs.CV

    Beyond H&E: Unlocking Pathological Insights with Polarization via Self-supervised Learning

    Authors: Yao Du, Jiaxin Zhuang, Xiaoyu Zheng, Jing Cong, Limei Guo, Chao He, Lin Luo, Xiaomeng Li

    Abstract: Histopathology image analysis is fundamental to digital pathology, with hematoxylin and eosin (H&E) staining as the gold standard for diagnostic and prognostic assessments. While H&E imaging effectively highlights cellular and tissue structures, it lacks sensitivity to birefringence and tissue anisotropy, which are crucial for assessing collagen organization, fiber alignment, and microstructural a… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  37. arXiv:2503.03284  [pdf, other

    cs.CV cs.GR

    Gaussian highpass guided image filtering

    Authors: Lei Zhao, Chuanjiang He

    Abstract: Guided image filtering (GIF) is a popular smoothing technique, in which an additional image is used as a structure guidance for noise removal with edge preservation. The original GIF and some of its subsequent improvements are derived from a two-parameter local affine model (LAM), where the filtering output is a local affine transformation of the guidance image, but the input image is not taken in… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  38. arXiv:2503.01358  [pdf, other

    cs.HC

    RemiHaven: Integrating "In-Town" and "Out-of-Town" Peers to Provide Personalized Reminiscence Support for Older Drifters

    Authors: Xuechen Zhang, Changyang He, Peng Zhang, Hansu Gu, Ning Gu, Qi Shen, Zhan Hu, Tun Lu

    Abstract: With increasing social mobility and an aging society, more older adults in China are migrating to new cities, known as "older drifters." Due to fewer social connections and cultural adaptation challenges, they face negative emotions such as loneliness and depression. While reminiscence-based interventions have been used to improve older adults' psychological well-being, challenges such as the lack… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  39. arXiv:2503.01352  [pdf, other

    eess.IV cs.CV

    Diffusion-based Virtual Staining from Polarimetric Mueller Matrix Imaging

    Authors: Xiaoyu Zheng, Jing Wen, Jiaxin Zhuang, Yao Du, Jing Cong, Limei Guo, Chao He, Lin Luo, Hao Chen

    Abstract: Polarization, as a new optical imaging tool, has been explored to assist in the diagnosis of pathology. Moreover, converting the polarimetric Mueller Matrix (MM) to standardized stained images becomes a promising approach to help pathologists interpret the results. However, existing methods for polarization-based virtual staining are still in the early stage, and the diffusion-based model, which h… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  40. arXiv:2502.20823  [pdf, other

    cs.CV

    Can We Simplify Slide-level Fine-tuning of Pathology Foundation Models?

    Authors: Jiawen Li, Jiali Hu, Qiehe Sun, Renao Yan, Minxi Ouyang, Tian Guan, Anjia Han, Chao He, Yonghong He

    Abstract: The emergence of foundation models in computational pathology has transformed histopathological image analysis, with whole slide imaging (WSI) diagnosis being a core application. Traditionally, weakly supervised fine-tuning via multiple instance learning (MIL) has been the primary method for adapting foundation models to WSIs. However, in this work we present a key experimental finding: a simple n… ▽ More

    Submitted 28 February, 2025; originally announced February 2025.

    Comments: 11 pages, 3 figures, 4 tables

  41. arXiv:2502.20653  [pdf, other

    cs.CV cs.AI cs.LG

    Dataset Distillation with Neural Characteristic Function: A Minmax Perspective

    Authors: Shaobo Wang, Yicun Yang, Zhiyuan Liu, Chenghao Sun, Xuming Hu, Conghui He, Linfeng Zhang

    Abstract: Dataset distillation has emerged as a powerful approach for reducing data requirements in deep learning. Among various methods, distribution matching-based approaches stand out for their balance of computational efficiency and strong performance. However, existing distance metrics used in distribution matching often fail to accurately capture distributional differences, leading to unreliable measu… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

    Comments: Accepted by CVPR 2025, 11 pages, 7 figures

    Journal ref: Conference on Computer Vision and Pattern Recognition, 2025

  42. arXiv:2502.16802  [pdf, other

    cs.CL cs.AI

    Unsupervised Topic Models are Data Mixers for Pre-training Language Models

    Authors: Jiahui Peng, Xinlin Zhuang, Qiu Jiantao, Ren Ma, Jing Yu, Tianyi Bai, Conghui He

    Abstract: The performance of large language models (LLMs) is significantly affected by the quality and composition of their pre-training data, which is inherently diverse, spanning various domains, sources, and topics. Effectively integrating these heterogeneous data sources is crucial for optimizing LLM performance. Previous research has predominantly concentrated on domain-based data mixing, often neglect… ▽ More

    Submitted 5 March, 2025; v1 submitted 23 February, 2025; originally announced February 2025.

    Comments: 18 pages,7 figures

  43. arXiv:2502.15902  [pdf, other

    cs.LG cs.AI cs.CL

    IPAD: Inverse Prompt for AI Detection -- A Robust and Explainable LLM-Generated Text Detector

    Authors: Zheng Chen, Yushi Feng, Changyang He, Yue Deng, Hongxi Pu, Bo Li

    Abstract: Large Language Models (LLMs) have attained human-level fluency in text generation, which complicates the distinguishing between human-written and LLM-generated texts. This increases the risk of misuse and highlights the need for reliable detectors. Yet, existing detectors exhibit poor robustness on out-of-distribution (OOD) data and attacked data, which is critical for real-world scenarios. Also,… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

  44. arXiv:2502.14471  [pdf, other

    cs.CV

    Integrating Extra Modality Helps Segmentor Find Camouflaged Objects Well

    Authors: Chengyu Fang, Chunming He, Longxiang Tang, Yuelin Zhang, Chenyang Zhu, Yuqi Shen, Chubin Chen, Guoxia Xu, Xiu Li

    Abstract: Camouflaged Object Segmentation (COS) remains a challenging problem due to the subtle visual differences between camouflaged objects and backgrounds. Owing to the exceedingly limited visual cues available from visible spectrum, previous RGB single-modality approaches often struggle to achieve satisfactory results, prompting the exploration of multimodal data to enhance detection accuracy. In this… ▽ More

    Submitted 20 February, 2025; originally announced February 2025.

    Comments: 12 pages, 5 figures, 6 tables

  45. arXiv:2502.11501  [pdf, other

    cs.CL cs.CV

    Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?

    Authors: Zichen Wen, Yifeng Gao, Weijia Li, Conghui He, Linfeng Zhang

    Abstract: Multimodal large language models (MLLMs) have shown remarkable performance for cross-modal understanding and generation, yet still suffer from severe inference costs. Recently, abundant works have been proposed to solve this problem with token pruning, which identifies the redundant tokens in MLLMs and then prunes them to reduce the computation and KV storage costs, leading to significant accelera… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 12 pages, 3 figures

  46. arXiv:2502.11494  [pdf, other

    cs.CL cs.CV

    Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More

    Authors: Zichen Wen, Yifeng Gao, Shaobo Wang, Junyuan Zhang, Qintong Zhang, Weijia Li, Conghui He, Linfeng Zhang

    Abstract: Vision tokens in multimodal large language models often dominate huge computational overhead due to their excessive length compared to linguistic modality. Abundant recent methods aim to solve this problem with token pruning, which first defines an importance criterion for tokens and then prunes the unimportant vision tokens during inference. However, in this paper, we show that the importance is… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

    Comments: 15 pages, 8 figures

  47. arXiv:2502.08960  [pdf, ps, other

    cs.LG

    A Comprehensive Survey on Imbalanced Data Learning

    Authors: Xinyi Gao, Dongting Xie, Yihang Zhang, Zhengren Wang, Conghui He, Hongzhi Yin, Wentao Zhang

    Abstract: With the expansion of data availability, machine learning (ML) has achieved remarkable breakthroughs in both academia and industry. However, imbalanced data distributions are prevalent in various types of raw data and severely hinder the performance of ML by biasing the decision-making processes. To deepen the understanding of imbalanced data and facilitate the related research and applications, t… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  48. arXiv:2502.08142  [pdf, other

    cs.AI

    Bridging the Safety Gap: A Guardrail Pipeline for Trustworthy LLM Inferences

    Authors: Shanshan Han, Salman Avestimehr, Chaoyang He

    Abstract: We present Wildflare GuardRail, a guardrail pipeline designed to enhance the safety and reliability of Large Language Model (LLM) inferences by systematically addressing risks across the entire processing workflow. Wildflare GuardRail integrates several core functional modules, including Safety Detector that identifies unsafe inputs and detects hallucinations in model outputs while generating root… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

    Comments: arXiv admin note: text overlap with arXiv:2406.10847

  49. arXiv:2502.07346  [pdf, other

    cs.CL

    BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models

    Authors: Xu Huang, Wenhao Zhu, Hanxu Hu, Conghui He, Lei Li, Shujian Huang, Fei Yuan

    Abstract: Previous multilingual benchmarks focus primarily on simple understanding tasks, but for large language models(LLMs), we emphasize proficiency in instruction following, reasoning, long context understanding, code generation, and so on. However, measuring these advanced capabilities across languages is underexplored. To address the disparity, we introduce BenchMAX, a multi-way multilingual evaluatio… ▽ More

    Submitted 20 April, 2025; v1 submitted 11 February, 2025; originally announced February 2025.

  50. arXiv:2502.06782  [pdf, other

    cs.CV

    Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT

    Authors: Dongyang Liu, Shicheng Li, Yutong Liu, Zhen Li, Kai Wang, Xinyue Li, Qi Qin, Yufei Liu, Yi Xin, Zhongyu Li, Bin Fu, Chenyang Si, Yuewen Cao, Conghui He, Ziwei Liu, Yu Qiao, Qibin Hou, Hongsheng Li, Peng Gao

    Abstract: Recent advancements have established Diffusion Transformers (DiTs) as a dominant framework in generative modeling. Building on this success, Lumina-Next achieves exceptional performance in the generation of photorealistic images with Next-DiT. However, its potential for video generation remains largely untapped, with significant challenges in modeling the spatiotemporal complexity inherent to vide… ▽ More

    Submitted 12 February, 2025; v1 submitted 10 February, 2025; originally announced February 2025.