Skip to main content

Showing 1–50 of 11,240 results for author: Guo

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.18294  [pdf, ps, other

    cs.RO

    Improvement on LiDAR-Camera Calibration Using Square Targets

    Authors: Zhongyuan Li, Honggang Gou, Ping Li, Jiaotong Guo, Mao Ye

    Abstract: Precise sensor calibration is critical for autonomous vehicles as a prerequisite for perception algorithms to function properly. Rotation error of one degree can translate to position error of meters in target object detection at large distance, leading to improper reaction of the system or even safety related issues. Many methods for multi-sensor calibration have been proposed. However, there are… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  2. arXiv:2506.18292  [pdf

    cs.CV

    Rapeseed population point cloud completion network (RP-PCN) with dynamic graph convolution for 3D reconstruction of crop canopy occlusion architecture

    Authors: Ziyue Guo, Xin Yang, Yutao Shen, Yang Zhu, Lixi Jiang, Haiyan Cen

    Abstract: Quantitative descriptions of complete canopy architecture are crucial for evaluating crop photosynthesis and yield to guide ideotype design. Although three-dimensional (3D) sensing technologies have been developed for plant and canopy reconstruction, severe occlusion and complex architectures hinder accurate canopy descriptions. In this study, we propose a point cloud completion model for 3D recon… ▽ More

    Submitted 23 June, 2025; originally announced June 2025.

  3. arXiv:2506.18246  [pdf, ps, other

    cs.CV

    Referring Expression Instance Retrieval and A Strong End-to-End Baseline

    Authors: Xiangzhao Hao, Kuan Zhu, Hongyu Guo, Haiyun Guo, Ming Tang, JinQiao Wang

    Abstract: Natural language querying of visual content underpins many vision-language tasks, typically categorized by text granularity and visual search scope. Text-Image Retrieval (TIR) retrieves whole images using coarse descriptions, while Referring Expression Comprehension (REC) localizes objects using fine-grained expressions within a single image. However, real-world scenarios often require both instan… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  4. arXiv:2506.18088  [pdf, ps, other

    cs.RO cs.AI cs.CL cs.CV cs.MA

    RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation

    Authors: Tianxing Chen, Zanxin Chen, Baijun Chen, Zijian Cai, Yibin Liu, Qiwei Liang, Zixuan Li, Xianliang Lin, Yiheng Ge, Zhenyu Gu, Weiliang Deng, Yubin Guo, Tian Nian, Xuanbing Xie, Qiangyu Chen, Kailun Su, Tianling Xu, Guodong Liu, Mengkang Hu, Huan-ang Gao, Kaixuan Wang, Zhixuan Liang, Yusen Qin, Xiaokang Yang, Ping Luo , et al. (1 additional authors not shown)

    Abstract: Simulation-based data synthesis has emerged as a powerful paradigm for enhancing real-world robotic manipulation. However, existing synthetic datasets remain insufficient for robust bimanual manipulation due to two challenges: (1) the lack of an efficient, scalable data generation method for novel tasks, and (2) oversimplified simulation environments that fail to capture real-world complexity. We… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Project Page: https://robotwin-platform.github.io/

  5. arXiv:2506.18084  [pdf, ps, other

    cs.CV

    TEM^3-Learning: Time-Efficient Multimodal Multi-Task Learning for Advanced Assistive Driving

    Authors: Wenzhuo Liu, Yicheng Qiao, Zhen Wang, Qiannan Guo, Zilong Chen, Meihua Zhou, Xinran Li, Letian Wang, Zhiwei Li, Huaping Liu, Wenshuo Wang

    Abstract: Multi-task learning (MTL) can advance assistive driving by exploring inter-task correlations through shared representations. However, existing methods face two critical limitations: single-modality constraints limiting comprehensive scene understanding and inefficient architectures impeding real-time deployment. This paper proposes TEM^3-Learning (Time-Efficient Multimodal Multi-task Learning), a… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

  6. arXiv:2506.18046  [pdf, ps, other

    cs.LG

    TAB: Unified Benchmarking of Time Series Anomaly Detection Methods

    Authors: Xiangfei Qiu, Zhe Li, Wanghui Qiu, Shiyan Hu, Lekui Zhou, Xingjian Wu, Zhengyu Li, Chenjuan Guo, Aoying Zhou, Zhenli Sheng, Jilin Hu, Christian S. Jensen, Bin Yang

    Abstract: Time series anomaly detection (TSAD) plays an important role in many domains such as finance, transportation, and healthcare. With the ongoing instrumentation of reality, more time series data will be available, leading also to growing demands for TSAD. While many TSAD methods already exist, new and better methods are still desirable. However, effective progress hinges on the availability of relia… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Accepted by PVLDB2025

  7. arXiv:2506.18017  [pdf, ps, other

    cs.GR cs.AI cs.CV

    Auto-Regressive Surface Cutting

    Authors: Yang Li, Victor Cheung, Xinhai Liu, Yuguang Chen, Zhongjin Luo, Biwen Lei, Haohan Weng, Zibo Zhao, Jingwei Huang, Zhuo Chen, Chunchao Guo

    Abstract: Surface cutting is a fundamental task in computer graphics, with applications in UV parameterization, texture mapping, and mesh decomposition. However, existing methods often produce technically valid but overly fragmented atlases that lack semantic coherence. We introduce SeamGPT, an auto-regressive model that generates cutting seams by mimicking professional workflows. Our key technical innovati… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: Tech. report. https://victorcheung12.github.io/seamgpt

  8. arXiv:2506.17869  [pdf, ps, other

    cs.CV cs.RO

    Cross-modal State Space Modeling for Real-time RGB-thermal Wild Scene Semantic Segmentation

    Authors: Xiaodong Guo, Zi'ang Lin, Luwen Hu, Zhihong Deng, Tong Liu, Wujie Zhou

    Abstract: The integration of RGB and thermal data can significantly improve semantic segmentation performance in wild environments for field robots. Nevertheless, multi-source data processing (e.g. Transformer-based approaches) imposes significant computational overhead, presenting challenges for resource-constrained systems. To resolve this critical limitation, we introduced CM-SSM, an efficient RGB-therma… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  9. arXiv:2506.17851  [pdf

    physics.soc-ph cs.CY cs.SI stat.AP

    Triadic Novelty: A Typology and Measurement Framework for Recognizing Novel Contributions in Science

    Authors: Jin Ai, Richard S. Steinberg, Chao Guo, Filipi Nascimento Silva

    Abstract: Scientific progress depends on novel ideas, but current reward systems often fail to recognize them. Many existing metrics conflate novelty with popularity, privileging ideas that fit existing paradigms over those that challenge them. This study develops a theory-driven framework to better understand how different types of novelty emerge, take hold, and receive recognition. Drawing on network scie… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: 27 pages, 3 figures, 5 tables

  10. arXiv:2506.17761  [pdf, ps, other

    cs.LG

    Towards a Unified Textual Graph Framework for Spectral Reasoning via Physical and Chemical Information Fusion

    Authors: Jiheng Liang, Ziru Yu, Zujie Xie, Yuchen Guo, Yulan Guo, Xiangyang Yu

    Abstract: Motivated by the limitations of current spectral analysis methods-such as reliance on single-modality data, limited generalizability, and poor interpretability-we propose a novel multi-modal spectral analysis framework that integrates prior knowledge graphs with Large Language Models. Our method explicitly bridges physical spectral measurements and chemical structural semantics by representing the… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

    Comments: 16 pages, 7 figures, 8 tables

  11. arXiv:2506.17640  [pdf, ps, other

    cs.SI

    Empowering Iterative Graph Alignment Using Heat Diffusion

    Authors: Boyan Wang, Weijie Feng, Jinyang Huang, Dan Guo, Zhi Liu

    Abstract: Unsupervised plain graph alignment (UPGA) aims to align corresponding nodes across two graphs without any auxiliary information. Existing UPGA methods rely on structural consistency while neglecting the inherent structural differences in real-world graphs, leading to biased node representations. Moreover, their one-shot alignment strategies lack mechanisms to correct erroneous matches arising from… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  12. arXiv:2506.17632  [pdf, ps, other

    cs.CV

    Optimization-Free Patch Attack on Stereo Depth Estimation

    Authors: Hangcheng Liu, Xu Kuang, Xingshuo Han, Xingwan Wu, Haoran Ou, Shangwei Guo, Xingyi Huang, Tao Xiang, Tianwei Zhang

    Abstract: Stereo Depth Estimation (SDE) is essential for scene understanding in vision-based systems like autonomous driving. However, recent studies show that SDE models are vulnerable to adversarial attacks, which are often limited to unrealistic settings, e.g., digital perturbations on separate stereo views in static scenes, restricting their real-world applicability. This raises a critical question: how… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  13. arXiv:2506.17578  [pdf, ps, other

    cs.CL

    AgriCHN: A Comprehensive Cross-domain Resource for Chinese Agricultural Named Entity Recognition

    Authors: Lingxiao Zeng, Yiqi Tong, Wei Guo, Huarui Wu, Lihao Ge, Yijun Ye, Fuzhen Zhuang, Deqing Wang, Wei Guo, Cheng Chen

    Abstract: Agricultural named entity recognition is a specialized task focusing on identifying distinct agricultural entities within vast bodies of text, including crops, diseases, pests, and fertilizers. It plays a crucial role in enhancing information extraction from extensive agricultural text resources. However, the scarcity of high-quality agricultural datasets, particularly in Chinese, has resulted in… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  14. arXiv:2506.17562  [pdf, ps, other

    cs.CV cs.CL

    LLM-driven Medical Report Generation via Communication-efficient Heterogeneous Federated Learning

    Authors: Haoxuan Che, Haibo Jin, Zhengrui Guo, Yi Lin, Cheng Jin, Hao Chen

    Abstract: LLMs have demonstrated significant potential in Medical Report Generation (MRG), yet their development requires large amounts of medical image-report pairs, which are commonly scattered across multiple centers. Centralizing these data is exceptionally challenging due to privacy regulations, thereby impeding model development and broader adoption of LLM-driven MRG models. To address this challenge,… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  15. arXiv:2506.17319  [pdf

    cs.CY cs.LG

    Using Machine Learning in Analyzing Air Quality Discrepancies of Environmental Impact

    Authors: Shuangbao Paul Wang, Lucas Yang, Rahouane Chouchane, Jin Guo, Michael Bailey

    Abstract: In this study, we apply machine learning and software engineering in analyzing air pollution levels in City of Baltimore. The data model was fed with three primary data sources: 1) a biased method of estimating insurance risk used by homeowners loan corporation, 2) demographics of Baltimore residents, and 3) census data estimate of NO2 and PM2.5 concentrations. The dataset covers 650,643 Baltimore… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: IEEE 2024 International Conference on AI x Data & Knowledge Engineering (AIxDKE)

  16. arXiv:2506.17281  [pdf, ps, other

    cs.IR cs.AI

    CORONA: A Coarse-to-Fine Framework for Graph-based Recommendation with Large Language Models

    Authors: Junze Chen, Xinjie Yang, Cheng Yang, Junfei Bao, Zeyuan Guo, Yawen Li, Chuan Shi

    Abstract: Recommender systems (RSs) are designed to retrieve candidate items a user might be interested in from a large pool. A common approach is using graph neural networks (GNNs) to capture high-order interaction relationships. As large language models (LLMs) have shown strong capabilities across domains, researchers are exploring their use to enhance recommendation. However, prior work limits LLMs to re… ▽ More

    Submitted 14 June, 2025; originally announced June 2025.

  17. arXiv:2506.17119  [pdf, ps, other

    cs.CV cs.RO

    RGBTrack: Fast, Robust Depth-Free 6D Pose Estimation and Tracking

    Authors: Teng Guo, Jingjin Yu

    Abstract: We introduce a robust framework, RGBTrack, for real-time 6D pose estimation and tracking that operates solely on RGB data, thereby eliminating the need for depth input for such dynamic and precise object pose tracking tasks. Building on the FoundationPose architecture, we devise a novel binary search strategy combined with a render-and-compare mechanism to efficiently infer depth and generate robu… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: Accepted to IROS 2025

  18. arXiv:2506.17114  [pdf, ps, other

    cs.AI

    Mathematical Proof as a Litmus Test: Revealing Failure Modes of Advanced Large Reasoning Models

    Authors: Dadi Guo, Jiayu Liu, Zhiyuan Fan, Zhitao He, Haoran Li, Yumeng Wang, Yi R. Fung

    Abstract: Large reasoning models (e.g., R1, o3) have demonstrated remarkable mathematical problem-solving abilities. However, the high reported accuracy of these advanced models on popular datasets, reliance on purely numerical evaluation and potential benchmark leakage, often masks their true reasoning shortcomings. To address this, we propose leveraging the inherent rigor and methodological complexity of… ▽ More

    Submitted 23 June, 2025; v1 submitted 20 June, 2025; originally announced June 2025.

  19. arXiv:2506.17110  [pdf, ps, other

    cs.RO cs.CV

    Monocular One-Shot Metric-Depth Alignment for RGB-Based Robot Grasping

    Authors: Teng Guo, Baichuan Huang, Jingjin Yu

    Abstract: Accurate 6D object pose estimation is a prerequisite for successfully completing robotic prehensile and non-prehensile manipulation tasks. At present, 6D pose estimation for robotic manipulation generally relies on depth sensors based on, e.g., structured light, time-of-flight, and stereo-vision, which can be expensive, produce noisy output (as compared with RGB cameras), and fail to handle transp… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: Accepted to IROS 2025

  20. arXiv:2506.17104  [pdf, ps, other

    cs.AI cs.CL cs.LO

    Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving

    Authors: Chuxue Cao, Mengze Li, Juntao Dai, Jinluan Yang, Zijian Zhao, Shengyu Zhang, Weijie Shi, Chengzhong Liu, Sirui Han, Yike Guo

    Abstract: Large language models (LLMs) have shown promising first-order logic (FOL) reasoning capabilities with applications in various areas. However, their effectiveness in complex mathematical reasoning involving multi-step FOL deductions is still under-researched. While LLMs perform competitively on established mathematical reasoning benchmarks, they struggle with multi-step FOL tasks, as demonstrated b… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  21. arXiv:2506.16716  [pdf, ps, other

    cs.HC

    V-CASS: Vision-context-aware Expressive Speech Synthesis for Enhancing User Understanding of Videos

    Authors: Qixin Wang, Songtao Zhou, Zeyu Jin, Chenglin Guo, Shikun Sun, Xiaoyu Qin

    Abstract: Automatic video commentary systems are widely used on multimedia social media platforms to extract factual information about video content. However, current systems may overlook essential para-linguistic cues, including emotion and attitude, which are critical for fully conveying the meaning of visual content. The absence of these cues can limit user understanding or, in some cases, distort the vi… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Accepted by IJCNN 2025

  22. arXiv:2506.16691  [pdf, ps, other

    cs.CV

    LaVi: Efficient Large Vision-Language Models via Internal Feature Modulation

    Authors: Tongtian Yue, Longteng Guo, Yepeng Tang, Zijia Zhao, Xinxin Zhu, Hua Huang, Jing Liu

    Abstract: Despite the impressive advancements of Large Vision-Language Models (LVLMs), existing approaches suffer from a fundamental bottleneck: inefficient visual-language integration. Current methods either disrupt the model's inherent structure or introduce severe long-context computational burden, severely limiting scalability and efficiency. In this paper, we rethink multimodal integration and present… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  23. arXiv:2506.16690  [pdf, ps, other

    cs.CV

    DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches

    Authors: Yun Xing, Yue Cao, Nhat Chung, Jie Zhang, Ivor Tsang, Ming-Ming Cheng, Yang Liu, Lei Ma, Qing Guo

    Abstract: Stereo Depth estimation is a critical task in autonomous driving and robotics, where inaccuracies (such as misidentifying nearby objects as distant) can lead to dangerous situations. Adversarial attacks against stereo depth estimation can help reveal vulnerabilities before deployment. Previous work has shown that repeating optimized textures can effectively mislead stereo depth estimation in digit… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  24. arXiv:2506.16677  [pdf, ps, other

    cs.HC cs.RO

    PPTP: Performance-Guided Physiological Signal-Based Trust Prediction in Human-Robot Collaboration

    Authors: Hao Guo, Wei Fan, Shaohui Liu, Feng Jiang, Chunzhi Yi

    Abstract: Trust prediction is a key issue in human-robot collaboration, especially in construction scenarios where maintaining appropriate trust calibration is critical for safety and efficiency. This paper introduces the Performance-guided Physiological signal-based Trust Prediction (PPTP), a novel framework designed to improve trust assessment. We designed a human-robot construction scenario with three di… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  25. arXiv:2506.16504  [pdf, ps, other

    cs.CV cs.AI

    Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details

    Authors: Zeqiang Lai, Yunfei Zhao, Haolin Liu, Zibo Zhao, Qingxiang Lin, Huiwen Shi, Xianghui Yang, Mingxin Yang, Shuhui Yang, Yifei Feng, Sheng Zhang, Xin Huang, Di Luo, Fan Yang, Fang Yang, Lifu Wang, Sicong Liu, Yixuan Tang, Yulin Cai, Zebin He, Tian Liu, Yuhong Liu, Jie Jiang, Linus, Jingwei Huang , et al. (1 additional authors not shown)

    Abstract: In this report, we present Hunyuan3D 2.5, a robust suite of 3D diffusion models aimed at generating high-fidelity and detailed textured 3D assets. Hunyuan3D 2.5 follows two-stages pipeline of its previous version Hunyuan3D 2.0, while demonstrating substantial advancements in both shape and texture generation. In terms of shape generation, we introduce a new shape foundation model -- LATTICE, which… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: Technical report

  26. arXiv:2506.16500  [pdf, ps, other

    cs.LG

    SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity

    Authors: Samir Khaki, Xiuyu Li, Junxian Guo, Ligeng Zhu, Chenfeng Xu, Konstantinos N. Plataniotis, Amir Yazdanbakhsh, Kurt Keutzer, Song Han, Zhijian Liu

    Abstract: Fine-tuning LLMs is both computationally and memory-intensive. While parameter-efficient fine-tuning methods, such as QLoRA and DoRA, reduce the number of trainable parameters and lower memory usage, they do not decrease computational cost. In some cases, they may even slow down fine-tuning. In this paper, we introduce SparseLoRA, a method that accelerates LLM fine-tuning through contextual sparsi… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: ICML 2025. The first three authors contributed equally to this work. Project page: https://z-lab.ai/projects/sparselora

  27. arXiv:2506.16096  [pdf, ps, other

    cs.LG cs.AI

    A Brain-to-Population Graph Learning Framework for Diagnosing Brain Disorders

    Authors: Qianqian Liao, Wuque Cai, Hongze Sun, Dongze Liu, Duo Chen, Dezhong Yao, Daqing Guo

    Abstract: Recent developed graph-based methods for diagnosing brain disorders using functional connectivity highly rely on predefined brain atlases, but overlook the rich information embedded within atlases and the confounding effects of site and phenotype variability. To address these challenges, we propose a two-stage Brain-to-Population Graph Learning (B2P-GL) framework that integrates the semantic simil… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

    Comments: 16 pages, 7 figures, 13 tables; this paper has been submitted for possible publication

  28. arXiv:2506.16024  [pdf, ps, other

    cs.CL cs.AI

    From General to Targeted Rewards: Surpassing GPT-4 in Open-Ended Long-Context Generation

    Authors: Zhihan Guo, Jiele Wu, Wenqian Cui, Yifei Zhang, Minda Hu, Yufei Wang, Irwin King

    Abstract: Current research on long-form context in Large Language Models (LLMs) primarily focuses on the understanding of long-contexts, the Open-ended Long Text Generation (Open-LTG) remains insufficiently explored. Training a long-context generation model requires curation of gold standard reference data, which is typically nonexistent for informative Open-LTG tasks. However, previous methods only utilize… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  29. arXiv:2506.15969  [pdf, ps, other

    cs.LG cs.CL

    LazyEviction: Lagged KV Eviction with Attention Pattern Observation for Efficient Long Reasoning

    Authors: Haoyue Zhang, Hualei Zhang, Xiaosong Ma, Jie Zhang, Song Guo

    Abstract: Large Language Models (LLMs) exhibit enhanced reasoning capabilities by employing Chain-of-Thought (CoT). However, the extended reasoning sequences introduce significant GPU memory overhead due to increased key-value (KV) cache size, particularly in tasks requiring long reasoning sequences, such as mathematics and programming. Existing KV cache compression methods mitigate memory bottlenecks but s… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  30. arXiv:2506.15873  [pdf, ps, other

    cs.HC

    DeckFlow: Iterative Specification on a Multimodal Generative Canvas

    Authors: Gregory Croisdale, Emily Huang, John Joon Young Chung, Anhong Guo, Xu Wang, Austin Z. Henley, Cyrus Omar

    Abstract: Generative AI promises to allow people to create high-quality personalized media. Although powerful, we identify three fundamental design problems with existing tooling through a literature review. We introduce a multimodal generative AI tool, DeckFlow, to address these problems. First, DeckFlow supports task decomposition by allowing users to maintain multiple interconnected subtasks on an infini… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  31. arXiv:2506.15844  [pdf, ps, other

    cs.DS

    HybHuff: Lossless Compression for Hypergraphs via Entropy-Guided Huffman-Bitwise Coordination

    Authors: Tianyu Zhao, Dongfang Zhao, Luanzheng Guo, Nathan Tallent

    Abstract: Hypergraphs provide a natural representation for many-to-many relationships in data-intensive applications, yet their scalability is often hindered by high memory consumption. While prior work has improved computational efficiency, reducing the space overhead of hypergraph representations remains a major challenge. This paper presents a hybrid compression framework for integer-based hypergraph adj… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  32. arXiv:2506.15786  [pdf, ps, other

    cs.GR cs.AI cs.LG physics.comp-ph physics.optics

    Graphics4Science: Computer Graphics for Scientific Impacts

    Authors: Peter Yichen Chen, Minghao Guo, Hanspeter Pfister, Ming Lin, William Freeman, Qixing Huang, Han-Wei Shen, Wojciech Matusik

    Abstract: Computer graphics, often associated with films, games, and visual effects, has long been a powerful tool for addressing scientific challenges--from its origins in 3D visualization for medical imaging to its role in modern computational modeling and simulation. This course explores the deep and evolving relationship between computer graphics and science, highlighting past achievements, ongoing cont… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  33. arXiv:2506.15721  [pdf, ps, other

    cs.LG

    Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration

    Authors: Junqi Gao, Zhichang Guo, Dazhi Zhang, Dong Li, Runze Liu, Pengfei Li, Kai Tian, Biqing Qi

    Abstract: Heterogeneous Large Language Model (LLM) fusion integrates the strengths of multiple source LLMs with different architectures into a target LLM with low computational overhead. While promising, existing methods suffer from two major limitations: 1) reliance on real data from limited domain for knowledge fusion, preventing the target LLM from fully acquiring knowledge across diverse domains, and 2)… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  34. arXiv:2506.15718  [pdf, ps, other

    cs.LG

    BuildingBRep-11K: Precise Multi-Storey B-Rep Building Solids with Rich Layout Metadata

    Authors: Yu Guo, Hongji Fang, Tianyu Fang, Zhe Cui

    Abstract: With the rise of artificial intelligence, the automatic generation of building-scale 3-D objects has become an active research topic, yet training such models still demands large, clean and richly annotated datasets. We introduce BuildingBRep-11K, a collection of 11 978 multi-storey (2-10 floors) buildings (about 10 GB) produced by a shape-grammar-driven pipeline that encodes established building-… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  35. arXiv:2506.15717  [pdf, ps, other

    cs.LG cs.AI cs.CL

    daDPO: Distribution-Aware DPO for Distilling Conversational Abilities

    Authors: Zhengze Zhang, Shiqi Wang, Yiqun Shen, Simin Guo, Dahua Lin, Xiaoliang Wang, Nguyen Cam-Tu, Fei Tan

    Abstract: Large language models (LLMs) have demonstrated exceptional performance across various applications, but their conversational abilities decline sharply as model size decreases, presenting a barrier to their deployment in resource-constrained environments. Knowledge distillation with Direct Preference Optimization (dDPO) has emerged as a promising approach to enhancing the conversational abilities o… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  36. arXiv:2506.15695  [pdf, ps, other

    cs.LG

    SimuGen: Multi-modal Agentic Framework for Constructing Block Diagram-Based Simulation Models

    Authors: Xinxing Ren, Qianbo Zang, Zekun Guo

    Abstract: Recent advances in large language models (LLMs) have shown impressive performance in mathematical reasoning and code generation. However, LLMs still struggle in the simulation domain, particularly in generating Simulink models, which are essential tools in engineering and scientific research. Our preliminary experiments indicate that LLM agents often fail to produce reliable and complete Simulink… ▽ More

    Submitted 27 May, 2025; originally announced June 2025.

  37. arXiv:2506.15685  [pdf, ps, other

    cs.LG cs.AI

    Ignition Phase : Standard Training for Fast Adversarial Robustness

    Authors: Wang Yu-Hang, Liu ying, Fang liang, Wang Xuelin, Junkang Guo, Shiwei Li, Lei Gao, Jian Liu, Wenfei Yin

    Abstract: Adversarial Training (AT) is a cornerstone defense, but many variants overlook foundational feature representations by primarily focusing on stronger attack generation. We introduce Adversarial Evolution Training (AET), a simple yet powerful framework that strategically prepends an Empirical Risk Minimization (ERM) phase to conventional AT. We hypothesize this initial ERM phase cultivates a favora… ▽ More

    Submitted 25 May, 2025; originally announced June 2025.

  38. arXiv:2506.15647  [pdf, ps, other

    cs.AI

    Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement

    Authors: Weixiang Zhao, Jiahe Guo, Yang Deng, Xingyu Sui, Yulin Hu, Yanyan Zhao, Wanxiang Che, Bing Qin, Tat-Seng Chua, Ting Liu

    Abstract: Recent advancements in large reasoning models (LRMs) have significantly enhanced language models' capabilities in complex problem-solving by emulating human-like deliberative thinking. However, these models often exhibit overthinking (i.e., the generation of unnecessarily verbose and redundant content), which hinders efficiency and inflates inference cost. In this work, we explore the representati… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  39. arXiv:2506.15645  [pdf, ps, other

    cs.CV cs.AI

    Demystifying the Visual Quality Paradox in Multimodal Large Language Models

    Authors: Shuo Xing, Lanqing Guo, Hongyuan Hua, Seoyoung Lee, Peiran Li, Yufei Wang, Zhangyang Wang, Zhengzhong Tu

    Abstract: Recent Multimodal Large Language Models (MLLMs) excel on benchmark vision-language tasks, yet little is known about how input visual quality shapes their responses. Does higher perceptual quality of images already translate to better MLLM understanding? We conduct the first systematic study spanning leading MLLMs and a suite of vision-language benchmarks, applying controlled degradations and styli… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 18 pages

  40. arXiv:2506.15624  [pdf, ps, other

    cs.AI

    The Effect of State Representation on LLM Agent Behavior in Dynamic Routing Games

    Authors: Lyle Goodyear, Rachel Guo, Ramesh Johari

    Abstract: Large Language Models (LLMs) have shown promise as decision-makers in dynamic settings, but their stateless nature necessitates creating a natural language representation of history. We present a unifying framework for systematically constructing natural language "state" representations for prompting LLM agents in repeated multi-agent games. Previous work on games with LLM agents has taken an ad h… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 27 pages, 20 figures

    ACM Class: I.2.11; I.6.4; F.1.2; F.2.2; G.3; J.7

  41. arXiv:2506.15524  [pdf, ps, other

    cs.CV

    NTIRE 2025 Image Shadow Removal Challenge Report

    Authors: Florin-Alexandru Vasluianu, Tim Seizinger, Zhuyun Zhou, Cailian Chen, Zongwei Wu, Radu Timofte, Mingjia Li, Jin Hu, Hainuo Wang, Hengxing Liu, Jiarui Wang, Qiming Hu, Xiaojie Guo, Xin Lu, Jiarong Yang, Yuanfei Bao, Anya Hu, Zihao Fan, Kunyu Wang, Jie Xiao, Xi Wang, Xueyang Fu, Zheng-Jun Zha, Yu-Fan Lin, Chia-Ming Lee , et al. (57 additional authors not shown)

    Abstract: This work examines the findings of the NTIRE 2025 Shadow Removal Challenge. A total of 306 participants have registered, with 17 teams successfully submitting their solutions during the final evaluation phase. Following the last two editions, this challenge had two evaluation tracks: one focusing on reconstruction fidelity and the other on visual perception through a user study. Both tracks were e… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  42. arXiv:2506.15451  [pdf, ps, other

    cs.CL

    AgentGroupChat-V2: Divide-and-Conquer Is What LLM-Based Multi-Agent System Need

    Authors: Zhouhong Gu, Xiaoxuan Zhu, Yin Cai, Hao Shen, Xingzhou Chen, Qingyi Wang, Jialin Li, Xiaoran Shi, Haoran Guo, Wenxuan Huang, Hongwei Feng, Yanghua Xiao, Zheyu Ye, Yao Hu, Shaosheng Cao

    Abstract: Large language model based multi-agent systems have demonstrated significant potential in social simulation and complex task resolution domains. However, current frameworks face critical challenges in system architecture design, cross-domain generalizability, and performance guarantees, particularly as task complexity and number of agents increases. We introduces AgentGroupChat-V2, a novel framewo… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  43. arXiv:2506.15442  [pdf, ps, other

    cs.CV cs.AI

    Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material

    Authors: Team Hunyuan3D, Shuhui Yang, Mingxin Yang, Yifei Feng, Xin Huang, Sheng Zhang, Zebin He, Di Luo, Haolin Liu, Yunfei Zhao, Qingxiang Lin, Zeqiang Lai, Xianghui Yang, Huiwen Shi, Zibo Zhao, Bowen Zhang, Hongyu Yan, Lifu Wang, Sicong Liu, Jihong Zhang, Meng Chen, Liang Dong, Yiwen Jia, Yulin Cai, Jiaao Yu , et al. (28 additional authors not shown)

    Abstract: 3D AI-generated content (AIGC) is a passionate field that has significantly accelerated the creation of 3D models in gaming, film, and design. Despite the development of several groundbreaking models that have revolutionized 3D generation, the field remains largely accessible only to researchers, developers, and designers due to the complexities involved in collecting, processing, and training 3D… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: Github link: https://github.com/Tencent-Hunyuan/Hunyuan3D-2.1

  44. arXiv:2506.15395  [pdf, ps, other

    eess.IV cs.AI cs.CV

    A Real-time Endoscopic Image Denoising System

    Authors: Yu Xing, Shishi Huang, Meng Lv, Guo Chen, Huailiang Wang, Lingzhi Sui

    Abstract: Endoscopes featuring a miniaturized design have significantly enhanced operational flexibility, portability, and diagnostic capability while substantially reducing the invasiveness of medical procedures. Recently, single-use endoscopes equipped with an ultra-compact analogue image sensor measuring less than 1mm x 1mm bring revolutionary advancements to medical diagnosis. They reduce the structural… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  45. arXiv:2506.15155  [pdf, ps, other

    cs.DC

    eLLM: Elastic Memory Management Framework for Efficient LLM Serving

    Authors: Jiale Xu, Rui Zhang, Yi Xiong, Cong Guo, Zihan Liu, Yangjie Zhou, Weiming Hu, Hao Wu, Changxu Shao, Ziqing Wang, Yongjie Yuan, Junping Zhao, Minyi Guo, Jingwen Leng

    Abstract: Large Language Models are increasingly being deployed in datacenters. Serving these models requires careful memory management, as their memory usage includes static weights, dynamic activations, and key-value caches. While static weights are constant and predictable, dynamic components such as activations and KV caches change frequently during runtime, presenting significant challenges for efficie… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  46. arXiv:2506.15115  [pdf, ps, other

    cs.LG

    Towards Reliable Forgetting: A Survey on Machine Unlearning Verification, Challenges, and Future Directions

    Authors: Lulu Xue, Shengshan Hu, Wei Lu, Yan Shen, Dongxu Li, Peijin Guo, Ziqi Zhou, Minghui Li, Yanjun Zhang, Leo Yu Zhang

    Abstract: With growing demands for privacy protection, security, and legal compliance (e.g., GDPR), machine unlearning has emerged as a critical technique for ensuring the controllability and regulatory alignment of machine learning models. However, a fundamental challenge in this field lies in effectively verifying whether unlearning operations have been successfully and thoroughly executed. Despite a grow… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  47. arXiv:2506.15078  [pdf, ps, other

    cs.CV cs.LG

    Enhancing Vector Quantization with Distributional Matching: A Theoretical and Empirical Study

    Authors: Xianghong Fang, Litao Guo, Hengchao Chen, Yuxuan Zhang, XiaofanXia, Dingjie Song, Yexin Liu, Hao Wang, Harry Yang, Yuan Yuan, Qiang Sun

    Abstract: The success of autoregressive models largely depends on the effectiveness of vector quantization, a technique that discretizes continuous features by mapping them to the nearest code vectors within a learnable codebook. Two critical issues in existing vector quantization methods are training instability and codebook collapse. Training instability arises from the gradient discrepancy introduced by… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

  48. arXiv:2506.14853  [pdf, ps, other

    q-bio.QM cs.LG

    DisProtEdit: Exploring Disentangled Representations for Multi-Attribute Protein Editing

    Authors: Max Ku, Sun Sun, Hongyu Guo, Wenhu Chen

    Abstract: We introduce DisProtEdit, a controllable protein editing framework that leverages dual-channel natural language supervision to learn disentangled representations of structural and functional properties. Unlike prior approaches that rely on joint holistic embeddings, DisProtEdit explicitly separates semantic factors, enabling modular and interpretable control. To support this, we construct SwissPro… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: Accepted to ICMLW (GenBio) 2025 and ICMLW (FM4LS) 2025

  49. arXiv:2506.14851  [pdf, ps, other

    cs.DC cs.AI cs.LG

    Efficient Serving of LLM Applications with Probabilistic Demand Modeling

    Authors: Yifei Liu, Zuo Gan, Zhenghao Gan, Weiye Wang, Chen Chen, Yizhou Shan, Xusheng Chen, Zhenhua Han, Yifei Zhu, Shixuan Sun, Minyi Guo

    Abstract: Applications based on Large Language Models (LLMs) contains a series of tasks to address real-world problems with boosted capability, which have dynamic demand volumes on diverse backends. Existing serving systems treat the resource demands of LLM applications as a blackbox, compromising end-to-end efficiency due to improper queuing order and backend warm up latency. We find that the resource dema… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

  50. arXiv:2506.14769  [pdf, ps, other

    cs.CV cs.RO

    CDP: Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion

    Authors: Jiahua Ma, Yiran Qin, Yixiong Li, Xuanqi Liao, Yulan Guo, Ruimao Zhang

    Abstract: Diffusion Policy (DP) enables robots to learn complex behaviors by imitating expert demonstrations through action diffusion. However, in practical applications, hardware limitations often degrade data quality, while real-time constraints restrict model inference to instantaneous state and scene observations. These limitations seriously reduce the efficacy of learning from expert demonstrations, re… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.