Skip to main content

Showing 1–50 of 660 results for author: Chenyang

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.15790  [pdf, ps, other

    cs.CR cs.SE

    ETrace:Event-Driven Vulnerability Detection in Smart Contracts via LLM-Based Trace Analysis

    Authors: Chenyang Peng, Haijun Wang, Yin Wu, Hao Wu, Ming Fan, Yitao Zhao, Ting Liu

    Abstract: With the advance application of blockchain technology in various fields, ensuring the security and stability of smart contracts has emerged as a critical challenge. Current security analysis methodologies in vulnerability detection can be categorized into static analysis and dynamic analysis methods.However, these existing traditional vulnerability detection methods predominantly rely on analyzing… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 4 pages, 1 figure. Submitted to the 16th Asia-Pacific Symposium on Internetware (Internetware 2025)

    ACM Class: D.2

  2. arXiv:2506.15672  [pdf, ps, other

    cs.AI cs.MA

    SwarmAgentic: Towards Fully Automated Agentic System Generation via Swarm Intelligence

    Authors: Yao Zhang, Chenyang Lin, Shijie Tang, Haokun Chen, Shijie Zhou, Yunpu Ma, Volker Tresp

    Abstract: The rapid progress of Large Language Models has advanced agentic systems in decision-making, coordination, and task execution. Yet, existing agentic system generation frameworks lack full autonomy, missing from-scratch agent generation, self-optimizing agent functionality, and collaboration, limiting adaptability and scalability. We propose SwarmAgentic, a framework for fully automated agentic sys… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 41 pages

  3. arXiv:2506.15655  [pdf, ps, other

    cs.SE cs.AI cs.CL cs.IR

    cAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree

    Authors: Yilin Zhang, Xinran Zhao, Zora Zhiruo Wang, Chenyang Yang, Jiayi Wei, Tongshuang Wu

    Abstract: Retrieval-Augmented Generation (RAG) has become essential for large-scale code generation, grounding predictions in external code corpora to improve actuality. However, a critical yet underexplored aspect of RAG pipelines is chunking -- the process of dividing documents into retrievable units. Existing line-based chunking heuristics often break semantic structures, splitting functions or merging u… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  4. arXiv:2506.15610  [pdf, ps, other

    cs.CV

    BoxFusion: Reconstruction-Free Open-Vocabulary 3D Object Detection via Real-Time Multi-View Box Fusion

    Authors: Yuqing Lan, Chenyang Zhu, Zhirui Gao, Jiazhao Zhang, Yihan Cao, Renjiao Yi, Yijie Wang, Kai Xu

    Abstract: Open-vocabulary 3D object detection has gained significant interest due to its critical applications in autonomous driving and embodied AI. Existing detection methods, whether offline or online, typically rely on dense point cloud reconstruction, which imposes substantial computational overhead and memory constraints, hindering real-time deployment in downstream tasks. To address this, we propose… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

    Comments: 11 pages, 6 figures

  5. arXiv:2506.13322  [pdf, ps, other

    cs.CV cs.AI

    Active Multimodal Distillation for Few-shot Action Recognition

    Authors: Weijia Feng, Yichen Zhu, Ruojia Zhang, Chenyang Wang, Fei Ma, Xiaobao Wang, Xiaobai Li

    Abstract: Owing to its rapid progress and broad application prospects, few-shot action recognition has attracted considerable interest. However, current methods are predominantly based on limited single-modal data, which does not fully exploit the potential of multimodal information. This paper presents a novel framework that actively identifies reliable modalities for each sample using task-specific contex… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: IJCAI 2025, the 34th International Joint Conference on Artificial Intelligence

  6. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, AdriĆ  de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  7. arXiv:2506.11820  [pdf, ps, other

    cs.CV cs.CL

    Rethinking Multilingual Vision-Language Translation: Dataset, Evaluation, and Adaptation

    Authors: Xintong Wang, Jingheng Pan, Yixiao Liu, Xiaohu Zhao, Chenyang Lyu, Minghao Wu, Chris Biemann, Longyue Wang, Linlong Xu, Weihua Luo, Kaifu Zhang

    Abstract: Vision-Language Translation (VLT) is a challenging task that requires accurately recognizing multilingual text embedded in images and translating it into the target language with the support of visual context. While recent Large Vision-Language Models (LVLMs) have demonstrated strong multilingual and visual understanding capabilities, there is a lack of systematic evaluation and understanding of t… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  8. arXiv:2506.11066  [pdf, ps, other

    cs.SE cs.AI

    CoQuIR: A Comprehensive Benchmark for Code Quality-Aware Information Retrieval

    Authors: Jiahui Geng, Fengyu Cai, Shaobo Cui, Qing Li, Liangwei Chen, Chenyang Lyu, Haonan Li, Derui Zhu, Walter Pretschner, Heinz Koeppl, Fakhri Karray

    Abstract: Code retrieval is essential in modern software development, as it boosts code reuse and accelerates debugging. However, current benchmarks primarily emphasize functional relevance while neglecting critical dimensions of software quality. Motivated by this gap, we introduce CoQuIR, the first large-scale, multilingual benchmark specifically designed to evaluate quality-aware code retrieval across fo… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  9. arXiv:2506.10289  [pdf, ps, other

    eess.AS cs.AI

    RT-VC: Real-Time Zero-Shot Voice Conversion with Speech Articulatory Coding

    Authors: Yisi Liu, Chenyang Wang, Hanjo Kim, Raniya Khan, Gopala Anumanchipalli

    Abstract: Voice conversion has emerged as a pivotal technology in numerous applications ranging from assistive communication to entertainment. In this paper, we present RT-VC, a zero-shot real-time voice conversion system that delivers ultra-low latency and high-quality performance. Our approach leverages an articulatory feature space to naturally disentangle content and speaker characteristics, facilitatin… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Comments: ACL Demo Track 2025

  10. arXiv:2506.09046  [pdf, ps, other

    cs.LG cs.AI cs.MA

    Agentic Neural Networks: Self-Evolving Multi-Agent Systems via Textual Backpropagation

    Authors: Xiaowen Ma, Chenyang Lin, Yao Zhang, Volker Tresp, Yunpu Ma

    Abstract: Leveraging multiple Large Language Models(LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network(ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design,… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  11. arXiv:2506.08490  [pdf, ps, other

    cs.CL

    Integration of Old and New Knowledge for Generalized Intent Discovery: A Consistency-driven Prototype-Prompting Framework

    Authors: Xiao Wei, Xiaobao Wang, Ning Zhuang, Chenyang Wang, Longbiao Wang, Jianwu dang

    Abstract: Intent detection aims to identify user intents from natural language inputs, where supervised methods rely heavily on labeled in-domain (IND) data and struggle with out-of-domain (OOD) intents, limiting their practical applicability. Generalized Intent Discovery (GID) addresses this by leveraging unlabeled OOD data to discover new intents without additional annotation. However, existing methods fo… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

    Comments: 9 pages, 2 figures, 7 tables, IJCAI 2025

  12. arXiv:2506.08446  [pdf, ps, other

    cs.AI cs.CL

    A Survey on Large Language Models for Mathematical Reasoning

    Authors: Peng-Yuan Wang, Tian-Shuo Liu, Chenyang Wang, Yi-Di Wang, Shu Yan, Cheng-Xing Jia, Xu-Hui Liu, Xin-Wei Chen, Jia-Cheng Xu, Ziniu Li, Yang Yu

    Abstract: Mathematical reasoning has long represented one of the most fundamental and challenging frontiers in artificial intelligence research. In recent years, large language models (LLMs) have achieved significant advances in this area. This survey examines the development of mathematical reasoning abilities in LLMs through two high-level cognitive phases: comprehension, where models gain mathematical un… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  13. arXiv:2506.07986  [pdf, ps, other

    cs.CV

    Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers

    Authors: Zhengyao Lv, Tianlin Pan, Chenyang Si, Zhaoxi Chen, Wangmeng Zuo, Ziwei Liu, Kwan-Yee K. Wong

    Abstract: Multimodal Diffusion Transformers (MM-DiTs) have achieved remarkable progress in text-driven visual generation. However, even state-of-the-art MM-DiT models like FLUX struggle with achieving precise alignment between text prompts and generated content. We identify two key issues in the attention mechanism of MM-DiT, namely 1) the suppression of cross-modal attention due to token imbalance between… ▽ More

    Submitted 11 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

    Comments: Project Page: https://vchitect.github.io/TACA/

  14. arXiv:2506.05901  [pdf, other

    cs.CL cs.AI

    Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router

    Authors: Chenyang Shao, Xinyang Liu, Yutang Lin, Fengli Xu, Yong Li

    Abstract: Multi-step reasoning has proven essential for enhancing the problem-solving capabilities of Large Language Models (LLMs) by decomposing complex tasks into intermediate steps, either explicitly or implicitly. Extending the reasoning chain at test time through deeper thought processes or broader exploration, can furthur improve performance, but often incurs substantial costs due to the explosion in… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  15. arXiv:2506.05207  [pdf, ps, other

    cs.CV

    Follow-Your-Motion: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning

    Authors: Yue Ma, Yulong Liu, Qiyuan Zhu, Ayden Yang, Kunyu Feng, Xinhua Zhang, Zhifeng Li, Sirui Han, Chenyang Qi, Qifeng Chen

    Abstract: Recently, breakthroughs in the video diffusion transformer have shown remarkable capabilities in diverse motion generations. As for the motion-transfer task, current methods mainly use two-stage Low-Rank Adaptations (LoRAs) finetuning to obtain better performance. However, existing adaptation-based motion transfer still suffers from motion inconsistency and tuning inefficiency when applied to larg… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

    Comments: project page: https://follow-your-motion.github.io/

  16. arXiv:2506.03722  [pdf, other

    cs.CL cs.SD eess.AS

    MFLA: Monotonic Finite Look-ahead Attention for Streaming Speech Recognition

    Authors: Yinfeng Xia, Huiyan Li, Chenyang Le, Manhong Wang, Yutao Sun, Xingyang Ma, Yanmin Qian

    Abstract: Applying large pre-trained speech models like Whisper has shown promise in reducing training costs for various speech tasks. However, integrating these models into streaming systems remains a challenge. This paper presents a novel prefix-to-prefix training framework for streaming recognition by fine-tuning the Whisper. We introduce the Continuous Integrate-and-Fire mechanism to establish a quasi-m… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted by Interspeech 2025

  17. arXiv:2506.03123  [pdf, ps, other

    cs.CV

    DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation

    Authors: Zhengyao Lv, Chenyang Si, Tianlin Pan, Zhaoxi Chen, Kwan-Yee K. Wong, Yu Qiao, Ziwei Liu

    Abstract: Diffusion Models have achieved remarkable results in video synthesis but require iterative denoising steps, leading to substantial computational overhead. Consistency Models have made significant progress in accelerating diffusion models. However, directly applying them to video diffusion models often results in severe degradation of temporal consistency and appearance details. In this paper, by a… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  18. arXiv:2506.01953  [pdf, ps, other

    cs.RO

    Fast-in-Slow: A Dual-System Foundation Model Unifying Fast Manipulation within Slow Reasoning

    Authors: Hao Chen, Jiaming Liu, Chenyang Gu, Zhuoyang Liu, Renrui Zhang, Xiaoqi Li, Xiao He, Yandong Guo, Chi-Wing Fu, Shanghang Zhang, Pheng-Ann Heng

    Abstract: Generalized policy and execution efficiency constitute the two critical challenges in robotic manipulation. While recent foundation policies benefit from the common-sense reasoning capabilities of internet-scale pretrained vision-language models (VLMs), they often suffer from low execution frequency. To mitigate this dilemma, dual-system approaches, inspired by Kahneman's theory, have been propose… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  19. arXiv:2506.00979  [pdf, ps, other

    cs.CV cs.AI

    IVY-FAKE: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection

    Authors: Wayne Zhang, Changjiang Jiang, Zhonghao Zhang, Chenyang Si, Fengchang Yu, Wei Peng

    Abstract: The rapid advancement of Artificial Intelligence Generated Content (AIGC) in visual domains has resulted in highly realistic synthetic images and videos, driven by sophisticated generative frameworks such as diffusion-based architectures. While these breakthroughs open substantial opportunities, they simultaneously raise critical concerns about content authenticity and integrity. Many current AIGC… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: 20pages,13figures,7 tables

  20. arXiv:2506.00088  [pdf, ps, other

    cs.CL cs.AI cs.LG

    HD-NDEs: Neural Differential Equations for Hallucination Detection in LLMs

    Authors: Qing Li, Jiahui Geng, Zongxiong Chen, Derui Zhu, Yuxia Wang, Congbo Ma, Chenyang Lyu, Fakhri Karray

    Abstract: In recent years, large language models (LLMs) have made remarkable advancements, yet hallucination, where models produce inaccurate or non-factual statements, remains a significant challenge for real-world deployment. Although current classification-based methods, such as SAPLMA, are highly efficient in mitigating hallucinations, they struggle when non-factual information arises in the early or mi… ▽ More

    Submitted 30 May, 2025; originally announced June 2025.

  21. arXiv:2505.24369  [pdf, ps, other

    cs.LG cs.AI

    Adversarial Preference Learning for Robust LLM Alignment

    Authors: Yuanfu Wang, Pengyu Wang, Chenyang Xi, Bo Tang, Junyi Zhu, Wenqiang Wei, Chen Chen, Chao Yang, Jingfeng Zhang, Chaochao Lu, Yijun Niu, Keming Mao, Zhiyu Li, Feiyu Xiong, Jie Hu, Mingchuan Yang

    Abstract: Modern language models often rely on Reinforcement Learning from Human Feedback (RLHF) to encourage safe behaviors. However, they remain vulnerable to adversarial attacks due to three key limitations: (1) the inefficiency and high cost of human annotation, (2) the vast diversity of potential adversarial attacks, and (3) the risk of feedback bias and reward hacking. To address these challenges, we… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: Accepted at ACL2025 Findings

  22. arXiv:2505.23932  [pdf, ps, other

    cs.CL

    SwingArena: Competitive Programming Arena for Long-context GitHub Issue Solving

    Authors: Wendong Xu, Jing Xiong, Chenyang Zhao, Qiujiang Chen, Haoran Wang, Hui Shen, Zhongwei Wan, Jianbo Dai, Taiqiang Wu, He Xiao, Chaofan Tao, Z. Morley Mao, Ying Sheng, Zhijiang Guo, Hongxia Yang, Bei Yu, Lingpeng Kong, Quanquan Gu, Ngai Wong

    Abstract: We present SwingArena, a competitive evaluation framework for Large Language Models (LLMs) that closely mirrors real-world software development workflows. Unlike traditional static benchmarks, SwingArena models the collaborative process of software iteration by pairing LLMs as submitters, who generate patches, and reviewers, who create test cases and verify the patches through continuous integrati… ▽ More

    Submitted 2 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

  23. arXiv:2505.22101  [pdf, other

    cs.CL

    MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models

    Authors: Zhiyu Li, Shichao Song, Hanyu Wang, Simin Niu, Ding Chen, Jiawei Yang, Chenyang Xi, Huayi Lai, Jihao Zhao, Yezhaohui Wang, Junpeng Ren, Zehao Lin, Jiahao Huo, Tianyi Chen, Kai Chen, Kehang Li, Zhiqiang Yin, Qingchen Yu, Bo Tang, Hongkang Yang, Zhi-Qin John Xu, Feiyu Xiong

    Abstract: Large Language Models (LLMs) have emerged as foundational infrastructure in the pursuit of Artificial General Intelligence (AGI). Despite their remarkable capabilities in language perception and generation, current LLMs fundamentally lack a unified and structured architecture for handling memory. They primarily rely on parametric memory (knowledge encoded in model weights) and ephemeral activation… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

  24. arXiv:2505.21943  [pdf, ps, other

    cs.CV

    Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting

    Authors: Wei Lin, Chenyang Zhao, Antoni B. Chan

    Abstract: Point detection has been developed to locate pedestrians in crowded scenes by training a counter through a point-to-point (P2P) supervision scheme. Despite its excellent localization and counting performance, training a point-based counter still faces challenges concerning annotation labor: hundreds to thousands of points are required to annotate a single sample capturing a dense crowd. In this pa… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: accepted by CVPR-2025(highlight)

  25. arXiv:2505.21849  [pdf, other

    cs.IR cs.AI

    Xinyu AI Search: Enhanced Relevance and Comprehensive Results with Rich Answer Presentations

    Authors: Bo Tang, Junyi Zhu, Chenyang Xi, Yunhang Ge, Jiahao Wu, Yuchen Feng, Yijun Niu, Wenqiang Wei, Yu Yu, Chunyu Li, Zehao Lin, Hao Wu, Ning Liao, Yebin Yang, Jiajia Wang, Zhiyu Li, Feiyu Xiong, Jingrun Chen

    Abstract: Traditional search engines struggle to synthesize fragmented information for complex queries, while generative AI search engines face challenges in relevance, comprehensiveness, and presentation. To address these limitations, we introduce Xinyu AI Search, a novel system that incorporates a query-decomposition graph to dynamically break down complex queries into sub-queries, enabling stepwise retri… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  26. arXiv:2505.21502  [pdf, ps, other

    cs.CV

    Generalizable and Relightable Gaussian Splatting for Human Novel View Synthesis

    Authors: Yipengjing Sun, Chenyang Wang, Shunyuan Zheng, Zonglin Li, Shengping Zhang, Xiangyang Ji

    Abstract: We propose GRGS, a generalizable and relightable 3D Gaussian framework for high-fidelity human novel view synthesis under diverse lighting conditions. Unlike existing methods that rely on per-character optimization or ignore physical constraints, GRGS adopts a feed-forward, fully supervised strategy that projects geometry, material, and illumination cues from multi-view 2D observations into 3D Gau… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Project Webpage: https://sypj-98.github.io/grgs/

  27. arXiv:2505.20362  [pdf, other

    cs.IR cs.AI

    VSCBench: Bridging the Gap in Vision-Language Model Safety Calibration

    Authors: Jiahui Geng, Qing Li, Zongxiong Chen, Yuxia Wang, Derui Zhu, Zhuohan Xie, Chenyang Lyu, Xiuying Chen, Preslav Nakov, Fakhri Karray

    Abstract: The rapid advancement of vision-language models (VLMs) has brought a lot of attention to their safety alignment. However, existing methods have primarily focused on model undersafety, where the model responds to hazardous queries, while neglecting oversafety, where the model refuses to answer safe queries. In this paper, we introduce the concept of $\textit{safety calibration}$, which systematical… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  28. arXiv:2505.18949  [pdf, ps, other

    cs.CL cs.AI cs.LG

    The Price of Format: Diversity Collapse in LLMs

    Authors: Longfei Yun, Chenyang An, Zilong Wang, Letian Peng, Jingbo Shang

    Abstract: Instruction-tuned large language models (LLMs) employ structured templates, such as role markers and special tokens, to enforce format consistency during inference. However, we identify a critical limitation of such formatting: it induces a phenomenon we term diversity collapse, where the model generates semantically similar outputs for open-ended inputs, undermining creativity and variability. We… ▽ More

    Submitted 24 May, 2025; originally announced May 2025.

    Comments: 14 pages, 7 figures

  29. arXiv:2505.17660  [pdf, ps, other

    cs.LG

    DAM-GT: Dual Positional Encoding-Based Attention Masking Graph Transformer for Node Classification

    Authors: Chenyang Li, Jinsong Chen, John E. Hopcroft, Kun He

    Abstract: Neighborhood-aware tokenized graph Transformers have recently shown great potential for node classification tasks. Despite their effectiveness, our in-depth analysis of neighborhood tokens reveals two critical limitations in the existing paradigm. First, current neighborhood token generation methods fail to adequately capture attribute correlations within a neighborhood. Second, the conventional s… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: Preprint version

  30. arXiv:2505.15778  [pdf, other

    cs.CL cs.AI

    Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space

    Authors: Zhen Zhang, Xuehai He, Weixiang Yan, Ao Shen, Chenyang Zhao, Shuohang Wang, Yelong Shen, Xin Eric Wang

    Abstract: Human cognition typically involves thinking through abstract, fluid concepts rather than strictly using discrete linguistic tokens. Current reasoning models, however, are constrained to reasoning within the boundaries of human language, processing discrete token embeddings that represent fixed points in the semantic space. This discrete constraint restricts the expressive power and upper potential… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  31. arXiv:2505.14244  [pdf, ps, other

    cs.CL

    TransBench: Benchmarking Machine Translation for Industrial-Scale Applications

    Authors: Haijun Li, Tianqi Shi, Zifu Shang, Yuxuan Han, Xueyu Zhao, Hao Wang, Yu Qian, Zhiqiang Qian, Linlong Xu, Minghao Wu, Chenyang Lyu, Longyue Wang, Gongbo Tang, Weihua Luo, Zhao Xu, Kaifu Zhang

    Abstract: Machine translation (MT) has become indispensable for cross-border communication in globalized industries like e-commerce, finance, and legal services, with recent advancements in large language models (LLMs) significantly enhancing translation quality. However, applying general-purpose MT models to industrial scenarios reveals critical limitations due to domain-specific terminology, cultural nuan… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  32. arXiv:2505.13633  [pdf, ps, other

    cs.CV

    IPENS:Interactive Unsupervised Framework for Rapid Plant Phenotyping Extraction via NeRF-SAM2 Fusion

    Authors: Wentao Song, He Huang, Youqiang Sun, Fang Qu, Jiaqi Zhang, Longhui Fang, Yuwei Hao, Chenyang Peng

    Abstract: Advanced plant phenotyping technologies play a crucial role in targeted trait improvement and accelerating intelligent breeding. Due to the species diversity of plants, existing methods heavily rely on large-scale high-precision manually annotated data. For self-occluded objects at the grain level, unsupervised methods often prove ineffective. This study proposes IPENS, an interactive unsupervised… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  33. arXiv:2505.13506  [pdf, other

    cs.CL cs.AI

    EcoSafeRAG: Efficient Security through Context Analysis in Retrieval-Augmented Generation

    Authors: Ruobing Yao, Yifei Zhang, Shuang Song, Neng Gao, Chenyang Tu

    Abstract: Retrieval-Augmented Generation (RAG) compensates for the static knowledge limitations of Large Language Models (LLMs) by integrating external knowledge, producing responses with enhanced factual correctness and query-specific contextualization. However, it also introduces new attack surfaces such as corpus poisoning at the same time. Most of the existing defense methods rely on the internal knowle… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  34. arXiv:2505.13360  [pdf, ps, other

    cs.CL cs.SE

    What Prompts Don't Say: Understanding and Managing Underspecification in LLM Prompts

    Authors: Chenyang Yang, Yike Shi, Qianou Ma, Michael Xieyang Liu, Christian KƤstner, Tongshuang Wu

    Abstract: Building LLM-powered software requires developers to communicate their requirements through natural language, but developer prompts are frequently underspecified, failing to fully capture many user-important requirements. In this paper, we present an in-depth analysis of prompt underspecification, showing that while LLMs can often (41.1%) guess unspecified requirements by default, such behavior is… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  35. arXiv:2505.13220  [pdf, ps, other

    cs.CL

    SeedBench: A Multi-task Benchmark for Evaluating Large Language Models in Seed Science

    Authors: Jie Ying, Zihong Chen, Zhefan Wang, Wanli Jiang, Chenyang Wang, Zhonghang Yuan, Haoyang Su, Huanjun Kong, Fan Yang, Nanqing Dong

    Abstract: Seed science is essential for modern agriculture, directly influencing crop yields and global food security. However, challenges such as interdisciplinary complexity and high costs with limited returns hinder progress, leading to a shortage of experts and insufficient technological support. While large language models (LLMs) have shown promise across various fields, their application in seed scien… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: Accepted by ACL 2025

  36. arXiv:2505.12126  [pdf, ps, other

    cs.DS

    Fair Submodular Maximization over a Knapsack Constraint

    Authors: Lijun Li, Chenyang Xu, Liuyi Yang, Ruilong Zhang

    Abstract: We consider fairness in submodular maximization subject to a knapsack constraint, a fundamental problem with various applications in economics, machine learning, and data mining. In the model, we are given a set of ground elements, each associated with a weight and a color, and a monotone submodular function defined over them. The goal is to maximize the submodular function while guaranteeing that… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: To appear in IJCAI 2025

  37. arXiv:2505.12123  [pdf, ps, other

    cs.DS

    Logarithmic Approximations for Fair k-Set Selection

    Authors: Shi Li, Chenyang Xu, Ruilong Zhang

    Abstract: We study the fair k-set selection problem where we aim to select $k$ sets from a given set system such that the (weighted) occurrence times that each element appears in these $k$ selected sets are balanced, i.e., the maximum (weighted) occurrence times are minimized. By observing that a set system can be formulated into a bipartite graph $G:=(L\cup R, E)$, our problem is equivalent to selecting… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: To appear in IJCAI 2025

  38. arXiv:2505.09915  [pdf, ps, other

    cs.CV cs.RO

    Large-Scale Gaussian Splatting SLAM

    Authors: Zhe Xin, Chenyang Wu, Penghui Huang, Yanyong Zhang, Yinian Mao, Guoquan Huang

    Abstract: The recently developed Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have shown encouraging and impressive results for visual SLAM. However, most representative methods require RGBD sensors and are only available for indoor environments. The robustness of reconstruction in large-scale outdoor scenarios remains unexplored. This paper introduces a large-scale 3DGS-based visual SLAM… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  39. arXiv:2505.04306  [pdf, ps, other

    cs.CV

    MoDE: Mixture of Diffusion Experts for Any Occluded Face Recognition

    Authors: Qiannan Fan, Zhuoyang Li, Jitong Li, Chenyang Cao

    Abstract: With the continuous impact of epidemics, people have become accustomed to wearing masks. However, most current occluded face recognition (OFR) algorithms lack prior knowledge of occlusions, resulting in poor performance when dealing with occluded faces of varying types and severity in reality. Recognizing occluded faces is still a significant challenge, which greatly affects the convenience of peo… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

    Comments: 8 pages,7 figures

    ACM Class: I.4.8; I.5.4; I.2.10

  40. arXiv:2505.01656  [pdf, other

    cs.CV

    A Novel WaveInst-based Network for Tree Trunk Structure Extraction and Pattern Analysis in Forest Inventory

    Authors: Chenyang Fan, Xujie Zhu, Taige Luo, Sheng Xu, Zhulin Chen, Hongxin Yang

    Abstract: The pattern analysis of tree structure holds significant scientific value for genetic breeding and forestry management. The current trunk and branch extraction technologies are mainly LiDAR-based or UAV-based. The former approaches obtain high-precision 3D data, but its equipment cost is high and the three-dimensional (3D) data processing is complex. The latter approaches efficiently capture canop… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  41. arXiv:2505.00308  [pdf

    cs.CV cs.AI stat.AP

    AI-Assisted Decision-Making for Clinical Assessment of Auto-Segmented Contour Quality

    Authors: Biling Wang, Austen Maniscalco, Ti Bai, Siqiu Wang, Michael Dohopolski, Mu-Han Lin, Chenyang Shen, Dan Nguyen, Junzhou Huang, Steve Jiang, Xinlei Wang

    Abstract: Purpose: This study presents a Deep Learning (DL)-based quality assessment (QA) approach for evaluating auto-generated contours (auto-contours) in radiotherapy, with emphasis on Online Adaptive Radiotherapy (OART). Leveraging Bayesian Ordinal Classification (BOC) and calibrated uncertainty thresholds, the method enables confident QA predictions without relying on ground truth contours or extensive… ▽ More

    Submitted 11 May, 2025; v1 submitted 1 May, 2025; originally announced May 2025.

  42. arXiv:2504.21585  [pdf, other

    cs.RO cs.AI eess.SY

    Multi-Goal Dexterous Hand Manipulation using Probabilistic Model-based Reinforcement Learning

    Authors: Yingzhuo Jiang, Wenjun Huang, Rongdun Lin, Chenyang Miao, Tianfu Sun, Yunduan Cui

    Abstract: This paper tackles the challenge of learning multi-goal dexterous hand manipulation tasks using model-based Reinforcement Learning. We propose Goal-Conditioned Probabilistic Model Predictive Control (GC-PMPC) by designing probabilistic neural network ensembles to describe the high-dimensional dexterous hand dynamics and introducing an asynchronous MPC policy to meet the control frequency requireme… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  43. arXiv:2504.16074  [pdf, other

    cs.CL

    PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

    Authors: Shi Qiu, Shaoyang Guo, Zhuo-Yang Song, Yunbo Sun, Zeyu Cai, Jiashen Wei, Tianyu Luo, Yixuan Yin, Haoxu Zhang, Yi Hu, Chenyang Wang, Chencheng Tang, Haoling Chang, Qi Liu, Ziheng Zhou, Tianyu Zhang, Jingtian Zhang, Zhangyi Liu, Minghao Li, Yuku Zhang, Boxuan Jing, Xianqi Yin, Yutong Ren, Zizhuo Fu, Jiaming Ji , et al. (29 additional authors not shown)

    Abstract: Current benchmarks for evaluating the reasoning capabilities of Large Language Models (LLMs) face significant limitations: task oversimplification, data contamination, and flawed evaluation items. These deficiencies necessitate more rigorous assessment methods. To address these limitations, we introduce PHYBench, a benchmark of 500 original physics problems ranging from high school to Physics Olym… ▽ More

    Submitted 18 May, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

    Comments: 34 pages ,12 figures, 7 tables, latest update in 2025/05/18

  44. arXiv:2504.15521  [pdf, other

    cs.CL

    The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks

    Authors: Minghao Wu, Weixuan Wang, Sinuo Liu, Huifeng Yin, Xintong Wang, Yu Zhao, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang

    Abstract: As large language models (LLMs) continue to advance in linguistic capabilities, robust multilingual evaluation has become essential for promoting equitable technological progress. This position paper examines over 2,000 multilingual (non-English) benchmarks from 148 countries, published between 2021 and 2024, to evaluate past, present, and future practices in multilingual benchmarking. Our finding… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: work in progress; 22 pages, 8 figures, 3 tables;

  45. Shape-Guided Clothing Warping for Virtual Try-On

    Authors: Xiaoyu Han, Shunyuan Zheng, Zonglin Li, Chenyang Wang, Xin Sun, Quanling Meng

    Abstract: Image-based virtual try-on aims to seamlessly fit in-shop clothing to a person image while maintaining pose consistency. Existing methods commonly employ the thin plate spline (TPS) transformation or appearance flow to deform in-shop clothing for aligning with the person's body. Despite their promising performance, these methods often lack precise control over fine details, leading to inconsistenc… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Accepted by ACM MM 2024. The code is available at https://github.com/xyhanHIT/SCW-VTON

  46. arXiv:2504.14948  [pdf, ps, other

    cs.GT

    Mechanism Design for Auctions with Externalities on Budgets

    Authors: Yusen Zheng, Yukun Cheng, Chenyang Xu, Xiaotie Deng

    Abstract: This paper studies mechanism design for auctions with externalities on budgets, a novel setting where the budgets that bidders commit are adjusted due to the externality of the competitors' allocation outcomes-a departure from traditional auctions with fixed budgets. This setting is motivated by real-world scenarios, for example, participants may increase their budgets in response to competitors'… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  47. arXiv:2504.14600  [pdf, ps, other

    cs.CV

    NTIRE 2025 Challenge on Real-World Face Restoration: Methods and Results

    Authors: Zheng Chen, Jingkai Wang, Kai Liu, Jue Gong, Lei Sun, Zongwei Wu, Radu Timofte, Yulun Zhang, Jianxing Zhang, Jinlong Wu, Jun Wang, Zheng Xie, Hakjae Jeon, Suejin Han, Hyung-Ju Chun, Hyunhee Park, Zhicun Yin, Junjie Chen, Ming Liu, Xiaoming Li, Chao Zhou, Wangmeng Zuo, Weixia Zhang, Dingquan Li, Kede Ma , et al. (29 additional authors not shown)

    Abstract: This paper provides a review of the NTIRE 2025 challenge on real-world face restoration, highlighting the proposed solutions and the resulting outcomes. The challenge focuses on generating natural, realistic outputs while maintaining identity consistency. Its goal is to advance state-of-the-art solutions for perceptual quality and realism, without imposing constraints on computational resources or… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

    Comments: NTIRE 2025 webpage: https://www.cvlai.net/ntire/2025. Code: https://github.com/zhengchen1999/NTIRE2025_RealWorld_Face_Restoration

  48. arXiv:2504.13853  [pdf, other

    q-bio.BM cs.AI

    GenShin:geometry-enhanced structural graph embodies binding pose can better predicting compound-protein interaction affinity

    Authors: Pingfei Zhu, Chenyang Zhao, Haishi Zhao, Bo Yang

    Abstract: AI-powered drug discovery typically relies on the successful prediction of compound-protein interactions, which are pivotal for the evaluation of designed compound molecules in structure-based drug design and represent a core challenge in the field. However, accurately predicting compound-protein affinity via regression models usually requires adequate-binding pose, which are derived from costly… ▽ More

    Submitted 16 March, 2025; originally announced April 2025.

    Comments: 11 pages, 3 figures

  49. arXiv:2504.11015  [pdf, other

    cs.CV

    AnimeDL-2M: Million-Scale AI-Generated Anime Image Detection and Localization in Diffusion Era

    Authors: Chenyang Zhu, Xing Zhang, Yuyang Sun, Ching-Chun Chang, Isao Echizen

    Abstract: Recent advances in image generation, particularly diffusion models, have significantly lowered the barrier for creating sophisticated forgeries, making image manipulation detection and localization (IMDL) increasingly challenging. While prior work in IMDL has focused largely on natural images, the anime domain remains underexplored-despite its growing vulnerability to AI-generated forgeries. Misre… ▽ More

    Submitted 23 May, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

    Comments: 8+2 pages; update figure 3,4,5 as adding real images into detection task tests

  50. arXiv:2504.09819  [pdf, other

    cs.CV

    Density-based Object Detection in Crowded Scenes

    Authors: Chenyang Zhao, Jia Wan, Antoni B. Chan

    Abstract: Compared with the generic scenes, crowded scenes contain highly-overlapped instances, which result in: 1) more ambiguous anchors during training of object detectors, and 2) more predictions are likely to be mistakenly suppressed in post-processing during inference. To address these problems, we propose two new strategies, density-guided anchors (DGA) and density-guided NMS (DG-NMS), which uses obj… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.