Skip to main content

Showing 1–50 of 425 results for author: Yuan, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.00992  [pdf, ps, other

    cs.CV

    UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis

    Authors: Yuanrui Wang, Cong Han, Yafei Li, Zhipeng Jin, Xiawei Li, SiNan Du, Wen Tao, Yi Yang, Shuanglong Li, Chun Yuan, Liu Lin

    Abstract: Text-to-image generation has greatly advanced content creation, yet accurately rendering visual text remains a key challenge due to blurred glyphs, semantic drift, and limited style control. Existing methods often rely on pre-rendered glyph images as conditions, but these struggle to retain original font styles and color cues, necessitating complex multi-branch designs that increase model overhead… ▽ More

    Submitted 2 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

    Comments: Accepted by ICCV 2025

  2. arXiv:2506.21049  [pdf, ps, other

    cs.CL cs.AI cs.IR

    A Semi-supervised Scalable Unified Framework for E-commerce Query Classification

    Authors: Chunyuan Yuan, Chong Zhang, Zheng Fang, Ming Pang, Xue Jiang, Changping Peng, Zhangang Lin, Ching Law

    Abstract: Query classification, including multiple subtasks such as intent and category prediction, is vital to e-commerce applications. E-commerce queries are usually short and lack context, and the information between labels cannot be used, resulting in insufficient prior information for modeling. Most existing industrial query classification methods rely on users' posterior click behavior to construct tr… ▽ More

    Submitted 26 June, 2025; originally announced June 2025.

    Comments: Accepted by ACL 2025

  3. arXiv:2506.17198  [pdf, ps, other

    cs.RO cs.CV

    Dex1B: Learning with 1B Demonstrations for Dexterous Manipulation

    Authors: Jianglong Ye, Keyi Wang, Chengjing Yuan, Ruihan Yang, Yiquan Li, Jiyue Zhu, Yuzhe Qin, Xueyan Zou, Xiaolong Wang

    Abstract: Generating large-scale demonstrations for dexterous hand manipulation remains challenging, and several approaches have been proposed in recent years to address this. Among them, generative models have emerged as a promising paradigm, enabling the efficient creation of diverse and physically plausible demonstrations. In this paper, we introduce Dex1B, a large-scale, diverse, and high-quality demons… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

    Comments: Accepted to RSS 2025. Project page: https://jianglongye.com/dex1b

  4. arXiv:2506.14791  [pdf, ps, other

    cs.CV cs.CL cs.LG

    SemIRNet: A Semantic Irony Recognition Network for Multimodal Sarcasm Detection

    Authors: Jingxuan Zhou, Yuehao Wu, Yibo Zhang, Yeyubei Zhang, Yunchong Liu, Bolin Huang, Chunhong Yuan

    Abstract: Aiming at the problem of difficulty in accurately identifying graphical implicit correlations in multimodal irony detection tasks, this paper proposes a Semantic Irony Recognition Network (SemIRNet). The model contains three main innovations: (1) The ConceptNet knowledge base is introduced for the first time to acquire conceptual knowledge, which enhances the model's common-sense reasoning ability… ▽ More

    Submitted 28 May, 2025; originally announced June 2025.

    Comments: 5 pages, 3 figures

  5. arXiv:2506.14625  [pdf, ps, other

    cs.CL cs.AI

    Probabilistic Aggregation and Targeted Embedding Optimization for Collective Moral Reasoning in Large Language Models

    Authors: Chenchen Yuan, Zheyu Zhang, Shuo Yang, Bardh Prenkaj, Gjergji Kasneci

    Abstract: Large Language Models (LLMs) have shown impressive moral reasoning abilities. Yet they often diverge when confronted with complex, multi-factor moral dilemmas. To address these discrepancies, we propose a framework that synthesizes multiple LLMs' moral judgments into a collectively formulated moral judgment, realigning models that deviate significantly from this consensus. Our aggregation mechanis… ▽ More

    Submitted 18 June, 2025; v1 submitted 17 June, 2025; originally announced June 2025.

    Comments: Accepted to ACL 2025 (Findings)

  6. arXiv:2506.12479  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.DC eess.SP

    AI Flow: Perspectives, Scenarios, and Approaches

    Authors: Hongjun An, Wenhan Hu, Sida Huang, Siqi Huang, Ruanjun Li, Yuanzhi Liang, Jiawei Shao, Yiliang Song, Zihan Wang, Cheng Yuan, Chi Zhang, Hongyuan Zhang, Wenhao Zhuang, Xuelong Li

    Abstract: Pioneered by the foundational information theory by Claude Shannon and the visionary framework of machine intelligence by Alan Turing, the convergent evolution of information and communication technologies (IT/CT) has created an unbroken wave of connectivity and computation. This synergy has sparked a technological revolution, now reaching its peak with large artificial intelligence (AI) models th… ▽ More

    Submitted 3 July, 2025; v1 submitted 14 June, 2025; originally announced June 2025.

    Comments: Authors are with Institute of Artificial Intelligence (TeleAI), China Telecom, China. Author names are listed alphabetically by surname. This work was conducted at TeleAI, facilitated by Dr. Jiawei Shao (e-mail: [email protected]) under the leadership of Prof. Xuelong Li. The corresponding author is Prof. Xuelong Li (e-mail: xuelong [email protected]), the CTO and Chief Scientist of China Telecom

  7. arXiv:2506.12103  [pdf, other

    cs.AI cs.CY cs.LG

    The Amazon Nova Family of Models: Technical Report and Model Card

    Authors: Amazon AGI, Aaron Langford, Aayush Shah, Abhanshu Gupta, Abhimanyu Bhatter, Abhinav Goyal, Abhinav Mathur, Abhinav Mohanty, Abhishek Kumar, Abhishek Sethi, Abi Komma, Abner Pena, Achin Jain, Adam Kunysz, Adam Opyrchal, Adarsh Singh, Aditya Rawal, Adok Achar Budihal Prasad, AdriĆ  de Gispert, Agnika Kumar, Aishwarya Aryamane, Ajay Nair, Akilan M, Akshaya Iyengar, Akshaya Vishnu Kudlu Shanbhogue , et al. (761 additional authors not shown)

    Abstract: We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance. Amazon Nova Pro is a highly-capable multimodal model with the best combination of accuracy, speed, and cost for a wide range of tasks. Amazon Nova Lite is a low-cost multimodal model that is lightning fast for processing images, video, documents… ▽ More

    Submitted 17 March, 2025; originally announced June 2025.

    Comments: 48 pages, 10 figures

    Report number: 20250317

  8. arXiv:2506.10487   

    cs.IR

    SHORE: A Long-term User Lifetime Value Prediction Model in Digital Games

    Authors: Congde Yuan

    Abstract: In digital gaming, long-term user lifetime value (LTV) prediction is essential for monetization strategy, yet presents major challenges due to delayed payment behavior, sparse early user data, and the presence of high-value outliers. While existing models typically rely on either short-cycle observations or strong distributional assumptions, such approaches often underestimate long-term value or s… ▽ More

    Submitted 12 June, 2025; originally announced June 2025.

    Comments: This version has been removed by arXiv administrators as the submitter did not have the right to agree to the license at the time of submission

  9. arXiv:2506.08427  [pdf, other

    cs.CL

    Know-MRI: A Knowledge Mechanisms Revealer&Interpreter for Large Language Models

    Authors: Jiaxiang Liu, Boxuan Xing, Chenhao Yuan, Chenxiang Zhang, Di Wu, Xiusheng Huang, Haida Yu, Chuhan Lang, Pengfei Cao, Jun Zhao, Kang Liu

    Abstract: As large language models (LLMs) continue to advance, there is a growing urgency to enhance the interpretability of their internal knowledge mechanisms. Consequently, many interpretation methods have emerged, aiming to unravel the knowledge mechanisms of LLMs from various perspectives. However, current interpretation methods differ in input data formats and interpreting outputs. The tools integrati… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  10. arXiv:2506.07375  [pdf, ps, other

    cs.CV

    DINO-CoDT: Multi-class Collaborative Detection and Tracking with Vision Foundation Models

    Authors: Xunjie He, Christina Dao Wen Lee, Meiling Wang, Chengran Yuan, Zefan Huang, Yufeng Yue, Marcelo H. Ang Jr

    Abstract: Collaborative perception plays a crucial role in enhancing environmental understanding by expanding the perceptual range and improving robustness against sensor failures, which primarily involves collaborative 3D detection and tracking tasks. The former focuses on object recognition in individual frames, while the latter captures continuous instance tracklets over time. However, existing works in… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  11. arXiv:2506.06826  [pdf, ps, other

    cs.CV cs.AI

    Controllable Coupled Image Generation via Diffusion Models

    Authors: Chenfei Yuan, Nanshan Jia, Hangqi Li, Peter W. Glynn, Zeyu Zheng

    Abstract: We provide an attention-level control method for the task of coupled image generation, where "coupled" means that multiple simultaneously generated images are expected to have the same or very similar backgrounds. While backgrounds coupled, the centered objects in the generated images are still expected to enjoy the flexibility raised from different text prompts. The proposed method disentangles t… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

  12. JGS2: Near Second-order Converging Jacobi/Gauss-Seidel for GPU Elastodynamics

    Authors: Lei Lan, Zixuan Lu, Chun Yuan, Weiwei Xu, Hao Su, Huamin Wang, Chenfanfu Jiang, Yin Yang

    Abstract: In parallel simulation, convergence and parallelism are often seen as inherently conflicting objectives. Improved parallelism typically entails lighter local computation and weaker coupling, which unavoidably slow the global convergence. This paper presents a novel GPU algorithm that achieves convergence rates comparable to fullspace Newton's method while maintaining good parallelizability just li… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  13. arXiv:2506.03827  [pdf, other

    cs.CL cs.AI cs.IR

    Multi-objective Aligned Bidword Generation Model for E-commerce Search Advertising

    Authors: Zhenhui Liu, Chunyuan Yuan, Ming Pang, Zheng Fang, Li Yuan, Xue Jiang, Changping Peng, Zhangang Lin, Zheng Luo, Jingping Shao

    Abstract: Retrieval systems primarily address the challenge of matching user queries with the most relevant advertisements, playing a crucial role in e-commerce search advertising. The diversity of user needs and expressions often produces massive long-tail queries that cannot be matched with merchant bidwords or product titles, which results in some advertisements not being recalled, ultimately harming use… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

    Comments: Accepted by SIGIR2025

  14. arXiv:2506.03737  [pdf, ps, other

    cs.CV cs.AI

    ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices

    Authors: Hao Yu, Tangyu Jiang, Shuning Jia, Shannan Yan, Shunning Liu, Haolong Qian, Guanghao Li, Shuting Dong, Huaisong Zhang, Chun Yuan

    Abstract: The Transformer architecture has revolutionized various regions since it was proposed, and its effectiveness largely depends on the ability to encode positional information. Traditional position encoding methods exhibit significant limitations due to lack of robustness and flexibility of position. Therefore, Rotary Positional Encoding (RoPE) was proposed to alleviate these issues, which integrates… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  15. arXiv:2505.24680  [pdf, ps, other

    cs.CL

    A Simple Linear Patch Revives Layer-Pruned Large Language Models

    Authors: Xinrui Chen, Haoli Bai, Tao Yuan, Ruikang Liu, Kang Zhao, Xianzhi Yu, Lu Hou, Tian Guan, Yonghong He, Chun Yuan

    Abstract: Layer pruning has become a popular technique for compressing large language models (LLMs) due to its simplicity. However, existing layer pruning methods often suffer from significant performance drops. We identify that this degradation stems from the mismatch of activation magnitudes across layers and tokens at the pruning interface. To address this, we propose LinearPatch, a simple plug-and-play… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

  16. arXiv:2505.24181  [pdf, other

    cs.AI

    SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought

    Authors: Guanghao Li, Wenhao Jiang, Mingfeng Chen, Yan Li, Hao Yu, Shuting Dong, Tao Ren, Ming Tang, Chun Yuan

    Abstract: Chain of Thought (CoT) prompting improves the reasoning performance of large language models (LLMs) by encouraging step by step thinking. However, CoT-based methods depend on intermediate reasoning steps, which limits scalability and generalization. Recent work explores recursive reasoning, where LLMs reuse internal layers across iterations to refine latent representations without explicit CoT sup… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

  17. arXiv:2505.23524  [pdf, ps, other

    cs.CV

    CLIP-AE: CLIP-assisted Cross-view Audio-Visual Enhancement for Unsupervised Temporal Action Localization

    Authors: Rui Xia, Dan Jiang, Quan Zhang, Ke Zhang, Chun Yuan

    Abstract: Temporal Action Localization (TAL) has garnered significant attention in information retrieval. Existing supervised or weakly supervised methods heavily rely on labeled temporal boundaries and action categories, which are labor-intensive and time-consuming. Consequently, unsupervised temporal action localization (UTAL) has gained popularity. However, current methods face two main challenges: 1) Cl… ▽ More

    Submitted 4 June, 2025; v1 submitted 29 May, 2025; originally announced May 2025.

  18. arXiv:2505.22963  [pdf, ps, other

    cs.NI cs.LG

    Agile Orchestration at Will: An Entire Smart Service-Based Security Architecture Towards 6G

    Authors: Zhuoran Duan, Guoshun Nan, Rushan Li, Zijun Wang, Lihua Xiong, Chaoying Yuan, Guorong Liu, Hui Xu, Qimei Cui, Xiaofeng Tao, Tony Q. S. Quek

    Abstract: The upcoming 6G will fundamentally reshape mobile networks beyond communications, unlocking a multitude of applications that were once considered unimaginable. Meanwhile, security and resilience are especially highlighted in the 6G design principles. However, safeguarding 6G networks will be quite challenging due to various known and unknown threats from highly heterogeneous networks and diversifi… ▽ More

    Submitted 18 June, 2025; v1 submitted 28 May, 2025; originally announced May 2025.

    Comments: Accepted by IEEE Wireless Communications Magazine

  19. arXiv:2505.19892  [pdf, other

    cs.AI

    Unifying Multimodal Large Language Model Capabilities and Modalities via Model Merging

    Authors: Yongxian Wei, Runxi Cheng, Weike Jin, Enneng Yang, Li Shen, Lu Hou, Sinan Du, Chun Yuan, Xiaochun Cao, Dacheng Tao

    Abstract: While foundation models update slowly due to resource-intensive training requirements, domain-specific models evolve between updates. Model merging aims to combine multiple expert models into a single, more capable model, thereby reducing storage and serving costs while supporting decentralized model development. Despite its potential, previous studies have primarily focused on merging visual clas… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  20. arXiv:2505.19849  [pdf, other

    cs.IR

    HIT Model: A Hierarchical Interaction-Enhanced Two-Tower Model for Pre-Ranking Systems

    Authors: Haoqiang Yang, Congde Yuan, Kun Bai, Mengzhuo Guo, Wei Yang, Chao Zhou

    Abstract: Online display advertising platforms rely on pre-ranking systems to efficiently filter and prioritize candidate ads from large corpora, balancing relevance to users with strict computational constraints. The prevailing two-tower architecture, though highly efficient due to its decoupled design and pre-caching, suffers from cross-domain interaction and coarse similarity metrics, undermining its cap… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 7 pages

  21. arXiv:2505.17796  [pdf, ps, other

    cs.CV cs.AI cs.IR

    DetailFusion: A Dual-branch Framework with Detail Enhancement for Composed Image Retrieval

    Authors: Yuxin Yang, Yinan Zhou, Yuxin Chen, Ziqi Zhang, Zongyang Ma, Chunfeng Yuan, Bing Li, Lin Song, Jun Gao, Peng Li, Weiming Hu

    Abstract: Composed Image Retrieval (CIR) aims to retrieve target images from a gallery based on a reference image and modification text as a combined query. Recent approaches focus on balancing global information from two modalities and encode the query into a unified feature for retrieval. However, due to insufficient attention to fine-grained details, these coarse fusion methods often struggle with handli… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 20 pages, 6 figures

  22. arXiv:2505.12844  [pdf, ps, other

    cs.AI cs.RO

    AGI-Elo: How Far Are We From Mastering A Task?

    Authors: Shuo Sun, Yimin Zhao, Christina Dao Wen Lee, Jiawei Sun, Chengran Yuan, Zefan Huang, Dongen Li, Justin KW Yeoh, Alok Prakash, Thomas W. Malone, Marcelo H. Ang Jr

    Abstract: As the field progresses toward Artificial General Intelligence (AGI), there is a pressing need for more comprehensive and insightful evaluation frameworks that go beyond aggregate performance metrics. This paper introduces a unified rating system that jointly models the difficulty of individual test cases and the competency of AI models (or humans) across vision, language, and action domains. Unli… ▽ More

    Submitted 24 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

  23. arXiv:2505.12667  [pdf, other

    cs.CV

    Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking

    Authors: Zihan Su, Xuerui Qiu, Hongbin Xu, Tangyu Jiang, Junhao Zhuang, Chun Yuan, Ming Li, Shengfeng He, Fei Richard Yu

    Abstract: The explosive growth of generative video models has amplified the demand for reliable copyright preservation of AI-generated content. Despite its popularity in image synthesis, invisible generative watermarking remains largely underexplored in video generation. To address this gap, we propose Safe-Sora, the first framework to embed graphical watermarks directly into the video generation process. M… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  24. arXiv:2505.11922  [pdf, ps, other

    cs.CL

    Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning

    Authors: Yuheng Lu, ZiMeng Bai, Caixia Yuan, Huixing Jiang, Xiaojie Wang

    Abstract: Large language models (LLMs) exhibit remarkable capabilities in handling natural language tasks; however, they may struggle to consistently follow complex instructions including those involve multiple constraints. Post-training LLMs using supervised fine-tuning (SFT) is a standard approach to improve their ability to follow instructions. In addressing complex instruction following, existing effort… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

  25. arXiv:2505.08735  [pdf, other

    cs.LG

    Preference Optimization for Combinatorial Optimization Problems

    Authors: Mingjun Pan, Guanquan Lin, You-Wei Luo, Bin Zhu, Zhien Dai, Lijun Sun, Chun Yuan

    Abstract: Reinforcement Learning (RL) has emerged as a powerful tool for neural combinatorial optimization, enabling models to learn heuristics that solve complex problems without requiring expert knowledge. Despite significant progress, existing RL approaches face challenges such as diminishing reward signals and inefficient exploration in vast combinatorial action spaces, leading to inefficiency. In this… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: This paper has been accepted by ICML 2025

  26. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  27. arXiv:2505.06607  [pdf, ps, other

    cs.CL

    Boosting Neural Language Inference via Cascaded Interactive Reasoning

    Authors: Min Li, Chun Yuan

    Abstract: Natural Language Inference (NLI) focuses on ascertaining the logical relationship (entailment, contradiction, or neutral) between a given premise and hypothesis. This task presents significant challenges due to inherent linguistic features such as diverse phrasing, semantic complexity, and contextual nuances. While Pre-trained Language Models (PLMs) built upon the Transformer architecture have yie… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  28. arXiv:2505.06605  [pdf, ps, other

    cs.CL

    Using External knowledge to Enhanced PLM for Semantic Matching

    Authors: Min Li, Chun Yuan

    Abstract: Modeling semantic relevance has always been a challenging and critical task in natural language processing. In recent years, with the emergence of massive amounts of annotated data, it has become feasible to train complex models, such as neural network-based reasoning models. These models have shown excellent performance in practical applications and have achieved the current state-ofthe-art perfo… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  29. arXiv:2505.04013  [pdf, other

    cs.LO cs.DM

    SAT-Solving the Poset Cover Problem

    Authors: Chih-Cheng Rex Yuan, Bow-Yaw Wang

    Abstract: The poset cover problem seeks a minimum set of partial orders whose linear extensions cover a given set of linear orders. Recognizing its NP-completeness, we devised a non-trivial reduction to the Boolean satisfiability problem using a technique we call swap graphs, which avoids the complexity explosion of the naive method. By leveraging modern SAT solvers, we efficiently solve instances with reas… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  30. arXiv:2505.01974  [pdf, other

    cs.RO

    KineDex: Learning Tactile-Informed Visuomotor Policies via Kinesthetic Teaching for Dexterous Manipulation

    Authors: Di Zhang, Chengbo Yuan, Chuan Wen, Hai Zhang, Junqiao Zhao, Yang Gao

    Abstract: Collecting demonstrations enriched with fine-grained tactile information is critical for dexterous manipulation, particularly in contact-rich tasks that require precise force control and physical interaction. While prior works primarily focus on teleoperation or video-based retargeting, they often suffer from kinematic mismatches and the absence of real-time tactile feedback, hindering the acquisi… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

  31. arXiv:2504.21634  [pdf, ps, other

    cs.CY cs.AI cs.LG

    Quantitative Auditing of AI Fairness with Differentially Private Synthetic Data

    Authors: Chih-Cheng Rex Yuan, Bow-Yaw Wang

    Abstract: Fairness auditing of AI systems can identify and quantify biases. However, traditional auditing using real-world data raises security and privacy concerns. It exposes auditors to security risks as they become custodians of sensitive information and targets for cyberattacks. Privacy risks arise even without direct breaches, as data analyses can inadvertently expose confidential information. To addr… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  32. arXiv:2504.19074  [pdf, other

    cs.CV cs.LG

    Dual-Branch Residual Network for Cross-Domain Few-Shot Hyperspectral Image Classification with Refined Prototype

    Authors: Anyong Qin, Chaoqi Yuan, Qiang Li, Feng Yang, Tiecheng Song, Chenqiang Gao

    Abstract: Convolutional neural networks (CNNs) are effective for hyperspectral image (HSI) classification, but their 3D convolutional structures introduce high computational costs and limited generalization in few-shot scenarios. Domain shifts caused by sensor differences and environmental variations further hinder cross-dataset adaptability. Metric-based few-shot learning (FSL) prototype networks mitigate… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

    Comments: 5 pages, 2 figures. IEEE Geoscience and Remote Sensing Letters (2025)

  33. arXiv:2504.13440  [pdf, other

    cs.CV

    Temporal Propagation of Asymmetric Feature Pyramid for Surgical Scene Segmentation

    Authors: Cheng Yuan, Yutong Ban

    Abstract: Surgical scene segmentation is crucial for robot-assisted laparoscopic surgery understanding. Current approaches face two challenges: (i) static image limitations including ambiguous local feature similarities and fine-grained structural details, and (ii) dynamic video complexities arising from rapid instrument motion and persistent visual occlusions. While existing methods mainly focus on spatial… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

  34. arXiv:2504.12240  [pdf, other

    cs.CV

    Cobra: Efficient Line Art COlorization with BRoAder References

    Authors: Junhao Zhuang, Lingen Li, Xuan Ju, Zhaoyang Zhang, Chun Yuan, Ying Shan

    Abstract: The comic production industry requires reference-based line art colorization with high accuracy, efficiency, contextual consistency, and flexible control. A comic page often involves diverse characters, objects, and backgrounds, which complicates the coloring process. Despite advancements in diffusion models for image generation, their application in line art colorization remains limited, facing c… ▽ More

    Submitted 6 May, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

    Comments: Project page with code: https://zhuang2002.github.io/Cobra/

  35. arXiv:2504.11798  [pdf, other

    cs.CV

    Neighbor-Based Feature and Index Enhancement for Person Re-Identification

    Authors: Chao Yuan, Tianyi Zhang, Guanglin Niu

    Abstract: Person re-identification (Re-ID) aims to match the same pedestrian in a large gallery with different cameras and views. Enhancing the robustness of the extracted feature representations is a main challenge in Re-ID. Existing methods usually improve feature representation by improving model architecture, but most methods ignore the potential contextual information, which limits the effectiveness of… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

    Comments: Comment: This paper has been accepted for publication in the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

  36. arXiv:2504.11658  [pdf, other

    cs.IR cs.AI

    Improving LLM Interpretability and Performance via Guided Embedding Refinement for Sequential Recommendation

    Authors: Nanshan Jia, Chenfei Yuan, Yuhang Wu, Zeyu Zheng

    Abstract: The fast development of Large Language Models (LLMs) offers growing opportunities to further improve sequential recommendation systems. Yet for some practitioners, integrating LLMs to their existing base recommendation systems raises questions about model interpretability, transparency and related safety. To partly alleviate challenges from these questions, we propose guided embedding refinement,… ▽ More

    Submitted 15 April, 2025; originally announced April 2025.

  37. arXiv:2504.10329  [pdf, other

    cs.CV

    InstructEngine: Instruction-driven Text-to-Image Alignment

    Authors: Xingyu Lu, Yuhang Hu, YiFan Zhang, Kaiyu Jiang, Changyi Liu, Tianke Zhang, Jinpeng Wang, Chun Yuan, Bin Wen, Fan Yang, Tingting Gao, Di Zhang

    Abstract: Reinforcement Learning from Human/AI Feedback (RLHF/RLAIF) has been extensively utilized for preference alignment of text-to-image models. Existing methods face certain limitations in terms of both data and algorithm. For training data, most approaches rely on manual annotated preference data, either by directly fine-tuning the generators or by training reward models to provide training signals. H… ▽ More

    Submitted 21 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

    Comments: 8 pages, 7 figures

  38. arXiv:2504.09461  [pdf, other

    cs.RO cs.AR

    ADDT -- A Digital Twin Framework for Proactive Safety Validation in Autonomous Driving Systems

    Authors: Bo Yu, Chaoran Yuan, Zishen Wan, Jie Tang, Fadi Kurdahi, Shaoshan Liu

    Abstract: Autonomous driving systems continue to face safety-critical failures, often triggered by rare and unpredictable corner cases that evade conventional testing. We present the Autonomous Driving Digital Twin (ADDT) framework, a high-fidelity simulation platform designed to proactively identify hidden faults, evaluate real-time performance, and validate safety before deployment. ADDT combines realisti… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  39. arXiv:2504.09103  [pdf, ps, other

    cs.RO

    IMPACT: Behavioral Intention-aware Multimodal Trajectory Prediction with Adaptive Context Trimming

    Authors: Jiawei Sun, Xibin Yue, Jiahui Li, Tianle Shen, Chengran Yuan, Shuo Sun, Sheng Guo, Quanyun Zhou, Marcelo H Ang Jr

    Abstract: While most prior research has focused on improving the precision of multimodal trajectory predictions, the explicit modeling of multimodal behavioral intentions (e.g., yielding, overtaking) remains relatively underexplored. This paper proposes a unified framework that jointly predicts both behavioral intentions and trajectories to enhance prediction accuracy, interpretability, and efficiency. Spec… ▽ More

    Submitted 26 June, 2025; v1 submitted 12 April, 2025; originally announced April 2025.

    Comments: under review

  40. arXiv:2504.04823  [pdf, other

    cs.CL cs.AI

    Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

    Authors: Ruikang Liu, Yuxuan Sun, Manyi Zhang, Haoli Bai, Xianzhi Yu, Tiezheng Yu, Chun Yuan, Lu Hou

    Abstract: Recent advancements in reasoning language models have demonstrated remarkable performance in complex tasks, but their extended chain-of-thought reasoning process increases inference overhead. While quantization has been widely adopted to reduce the inference cost of large language models, its impact on reasoning models remains understudied. In this study, we conduct the first systematic study on q… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  41. arXiv:2504.01403  [pdf, other

    cs.IR cs.AI cs.CL

    Generative Retrieval and Alignment Model: A New Paradigm for E-commerce Retrieval

    Authors: Ming Pang, Chunyuan Yuan, Xiaoyu He, Zheng Fang, Donghao Xie, Fanyi Qu, Xue Jiang, Changping Peng, Zhangang Lin, Zheng Luo, Jingping Shao

    Abstract: Traditional sparse and dense retrieval methods struggle to leverage general world knowledge and often fail to capture the nuanced features of queries and products. With the advent of large language models (LLMs), industrial search systems have started to employ LLMs to generate identifiers for product retrieval. Commonly used identifiers include (1) static/semantic IDs and (2) product term sets. T… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: Accepted by WWW2025

  42. arXiv:2503.23022  [pdf, other

    cs.CV

    MeshCraft: Exploring Efficient and Controllable Mesh Generation with Flow-based DiTs

    Authors: Xianglong He, Junyi Chen, Di Huang, Zexiang Liu, Xiaoshui Huang, Wanli Ouyang, Chun Yuan, Yangguang Li

    Abstract: In the domain of 3D content creation, achieving optimal mesh topology through AI models has long been a pursuit for 3D artists. Previous methods, such as MeshGPT, have explored the generation of ready-to-use 3D objects via mesh auto-regressive techniques. While these methods produce visually impressive results, their reliance on token-by-token predictions in the auto-regressive process leads to se… ▽ More

    Submitted 29 March, 2025; originally announced March 2025.

  43. arXiv:2503.21732  [pdf, other

    cs.CV

    SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling

    Authors: Xianglong He, Zi-Xin Zou, Chia-Hao Chen, Yuan-Chen Guo, Ding Liang, Chun Yuan, Wanli Ouyang, Yan-Pei Cao, Yangguang Li

    Abstract: Creating high-fidelity 3D meshes with arbitrary topology, including open surfaces and complex interiors, remains a significant challenge. Existing implicit field methods often require costly and detail-degrading watertight conversion, while other approaches struggle with high resolutions. This paper introduces SparseFlex, a novel sparse-structured isosurface representation that enables differentia… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Project page: https://xianglonghe.github.io/TripoSF

  44. arXiv:2503.19926  [pdf, other

    cs.SI cs.LG

    Unifying Structural Proximity and Equivalence for Enhanced Dynamic Network Embedding

    Authors: Suchanuch Piriyasatit, Chaohao Yuan, Ercan Engin Kuruoglu

    Abstract: Dynamic network embedding methods transform nodes in a dynamic network into low-dimensional vectors while preserving network characteristics, facilitating tasks such as node classification and community detection. Several embedding methods have been proposed to capture structural proximity among nodes in a network, where densely connected communities are preserved, while others have been proposed… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  45. arXiv:2503.19763  [pdf, other

    stat.ML cs.LG math.ST

    Interpretable Deep Regression Models with Interval-Censored Failure Time Data

    Authors: Changhui Yuan, Shishun Zhao, Shuwei Li, Xinyuan Song, Zhao Chen

    Abstract: Deep neural networks (DNNs) have become powerful tools for modeling complex data structures through sequentially integrating simple functions in each hidden layer. In survival analysis, recent advances of DNNs primarily focus on enhancing model capabilities, especially in exploring nonlinear covariate effects under right censoring. However, deep learning methods for interval-censored data, where t… ▽ More

    Submitted 25 March, 2025; originally announced March 2025.

  46. arXiv:2503.18738  [pdf, other

    cs.RO

    RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation

    Authors: Chengbo Yuan, Suraj Joshi, Shaoting Zhu, Hang Su, Hang Zhao, Yang Gao

    Abstract: Visual augmentation has become a crucial technique for enhancing the visual robustness of imitation learning. However, existing methods are often limited by prerequisites such as camera calibration or the need for controlled environments (e.g., green screen setups). In this work, we introduce RoboEngine, the first plug-and-play visual robot data augmentation toolkit. For the first time, users can… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: Project Page: https://roboengine.github.io/

  47. arXiv:2503.12926  [pdf, other

    eess.SP cs.CV

    Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference

    Authors: Cheng Yuan, Zhening Liu, Jiashu Lv, Jiawei Shao, Yufei Jiang, Jun Zhang, Xuelong Li

    Abstract: With the rapid development of large multimodal models (LMMs), multimodal understanding applications are emerging. As most LMM inference requests originate from edge devices with limited computational capabilities, the predominant inference pipeline involves directly forwarding the input data to an edge server which handles all computations. However, this approach introduces high transmission laten… ▽ More

    Submitted 17 March, 2025; originally announced March 2025.

  48. arXiv:2503.12366  [pdf, other

    cs.LG q-bio.NC

    ASD Classification on Dynamic Brain Connectome using Temporal Random Walk with Transformer-based Dynamic Network Embedding

    Authors: Suchanuch Piriyasatit, Chaohao Yuan, Ercan Engin Kuruoglu

    Abstract: Autism Spectrum Disorder (ASD) is a complex neurological condition characterized by varied developmental impairments, especially in communication and social interaction. Accurate and early diagnosis of ASD is crucial for effective intervention, which is enhanced by richer representations of brain activity. The brain functional connectome, which refers to the statistical relationships between diffe… ▽ More

    Submitted 16 March, 2025; originally announced March 2025.

  49. arXiv:2503.11073  [pdf, other

    cs.CV

    Perceive, Understand and Restore: Real-World Image Super-Resolution with Autoregressive Multimodal Generative Models

    Authors: Hongyang Wei, Shuaizheng Liu, Chun Yuan, Lei Zhang

    Abstract: By leveraging the generative priors from pre-trained text-to-image diffusion models, significant progress has been made in real-world image super-resolution (Real-ISR). However, these methods tend to generate inaccurate and unnatural reconstructions in complex and/or heavily degraded scenes, primarily due to their limited perception and understanding capability of the input low-quality image. To a… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  50. arXiv:2503.08099  [pdf, ps, other

    cs.LG

    Whoever Started the Interference Should End It: Guiding Data-Free Model Merging via Task Vectors

    Authors: Runxi Cheng, Feng Xiong, Yongxian Wei, Wanyun Zhu, Chun Yuan

    Abstract: Model merging seeks to integrate task-specific expert models into a unified architecture while preserving multi-task generalization capabilities, yet parameter interference between constituent models frequently induces performance degradation. Although prior work has explored many merging strategies, resolving interference without additional data for retraining or test-time computation remains cha… ▽ More

    Submitted 11 June, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: 23 pages, 13 figures, 12 tables