Skip to main content

Showing 1–50 of 834 results for author: Sun, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.09110  [pdf, ps, other

    cs.CR cs.DC cs.LG

    Toward Malicious Clients Detection in Federated Learning

    Authors: Zhihao Dou, Jiaqi Wang, Wei Sun, Zhuqing Liu, Minghong Fang

    Abstract: Federated learning (FL) enables multiple clients to collaboratively train a global machine learning model without sharing their raw data. However, the decentralized nature of FL introduces vulnerabilities, particularly to poisoning attacks, where malicious clients manipulate their local models to disrupt the training process. While Byzantine-robust aggregation rules have been developed to mitigate… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: To appear in ACM ASIACCS 2025

  2. arXiv:2505.08783  [pdf, ps, other

    cs.LG cs.AI cs.CL math.NA

    CodePDE: An Inference Framework for LLM-driven PDE Solver Generation

    Authors: Shanda Li, Tanya Marwah, Junhong Shen, Weiwei Sun, Andrej Risteski, Yiming Yang, Ameet Talwalkar

    Abstract: Partial differential equations (PDEs) are fundamental to modeling physical systems, yet solving them remains a complex challenge. Traditional numerical solvers rely on expert knowledge to implement and are computationally expensive, while neural-network-based solvers require large training datasets and often lack interpretability. In this work, we frame PDE solving as a code generation task and in… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  3. arXiv:2505.07634  [pdf, ps, other

    cs.RO cs.AI cs.CV

    Neural Brain: A Neuroscience-inspired Framework for Embodied Agents

    Authors: Jian Liu, Xiongtao Shi, Thai Duy Nguyen, Haitian Zhang, Tianxiang Zhang, Wei Sun, Yanjie Li, Athanasios V. Vasilakos, Giovanni Iacca, Arshad Ali Khan, Arvind Kumar, Jae Won Cho, Ajmal Mian, Lihua Xie, Erik Cambria, Lin Wang

    Abstract: The rapid evolution of artificial intelligence (AI) has shifted from static, data-driven models to dynamic systems capable of perceiving and interacting with real-world environments. Despite advancements in pattern recognition and symbolic reasoning, current AI systems, such as large language models, remain disembodied, unable to physically engage with the world. This limitation has driven the ris… ▽ More

    Submitted 14 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

    Comments: 51 pages, 17 figures, 9 tables

  4. arXiv:2505.06977  [pdf, other

    cs.AI cs.LG

    CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging

    Authors: Wenju Sun, Qingyong Li, Yangli-ao Geng, Boyang Li

    Abstract: Multi-task model merging offers a promising paradigm for integrating multiple expert models into a unified model without additional training. Existing state-of-the-art techniques, such as Task Arithmetic and its variants, merge models by accumulating task vectors -- the parameter differences between pretrained and finetuned models. However, task vector accumulation is often hindered by knowledge c… ▽ More

    Submitted 14 May, 2025; v1 submitted 11 May, 2025; originally announced May 2025.

  5. arXiv:2505.05283  [pdf, ps, other

    cs.SE cs.AI

    Software Development Life Cycle Perspective: A Survey of Benchmarks for Code Large Language Models and Agents

    Authors: Kaixin Wang, Tianlin Li, Xiaoyu Zhang, Chong Wang, Weisong Sun, Yang Liu, Bin Shi

    Abstract: Code large language models (CodeLLMs) and agents have shown great promise in tackling complex software engineering tasks.Compared to traditional software engineering methods, CodeLLMs and agents offer stronger abilities, and can flexibly process inputs and outputs in both natural and code. Benchmarking plays a crucial role in evaluating the capabilities of CodeLLMs and agents, guiding their develo… ▽ More

    Submitted 8 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

  6. arXiv:2505.04656  [pdf, other

    cs.GR

    MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation

    Authors: Zilong Chen, Yikai Wang, Wenqiang Sun, Feng Wang, Yiwen Chen, Huaping Liu

    Abstract: In this paper, we introduce MeshGen, an advanced image-to-3D pipeline that generates high-quality 3D meshes with detailed geometry and physically based rendering (PBR) textures. Addressing the challenges faced by existing 3D native diffusion models, such as suboptimal auto-encoder performance, limited controllability, poor generalization, and inconsistent image-based PBR texturing, MeshGen employs… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: To appear at CVPR 2025 with highlight

  7. arXiv:2505.03901  [pdf, other

    cs.SE

    Unveiling the Role of ChatGPT in Software Development: Insights from Developer-ChatGPT Interactions on GitHub

    Authors: Ruiyin Li, Peng Liang, Yifei Wang, Yangxiao Cai, Weisong Sun, Zengyang Li

    Abstract: The advent of Large Language Models (LLMs) has introduced a new paradigm in software engineering, with generative AI tools like ChatGPT gaining widespread adoption among developers. While ChatGPT's potential has been extensively discussed, there is limited empirical evidence exploring its real-world usage by developers. This study bridges this gap by conducting a large-scale empirical analysis of… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

    Comments: 25 pages, 10 images, 2 tables, Manuscript submitted to a journal (2025)

  8. arXiv:2505.03631  [pdf, other

    cs.CV

    Breaking Annotation Barriers: Generalized Video Quality Assessment via Ranking-based Self-Supervision

    Authors: Linhan Cao, Wei Sun, Kaiwei Zhang, Yicong Peng, Guangtao Zhai, Xiongkuo Min

    Abstract: Video quality assessment (VQA) is essential for quantifying perceptual quality in various video processing workflows, spanning from camera capture systems to over-the-top streaming platforms. While recent supervised VQA models have made substantial progress, the reliance on manually annotated datasets -- a process that is labor-intensive, costly, and difficult to scale up -- has hindered further o… ▽ More

    Submitted 7 May, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

  9. arXiv:2505.03344  [pdf, other

    cs.RO cs.LG

    RIFT: Closed-Loop RL Fine-Tuning for Realistic and Controllable Traffic Simulation

    Authors: Keyu Chen, Wenchao Sun, Hao Cheng, Sifa Zheng

    Abstract: Achieving both realism and controllability in interactive closed-loop traffic simulation remains a key challenge in autonomous driving. Data-driven simulation methods reproduce realistic trajectories but suffer from covariate shift in closed-loop deployment, compounded by simplified dynamics models that further reduce reliability. Conversely, physics-based simulation methods enhance reliable and c… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  10. arXiv:2505.03194  [pdf, ps, other

    cs.LG

    Convergence Of Consistency Model With Multistep Sampling Under General Data Assumptions

    Authors: Yiding Chen, Yiyi Zhang, Owen Oertell, Wen Sun

    Abstract: Diffusion models accomplish remarkable success in data generation tasks across various domains. However, the iterative sampling process is computationally expensive. Consistency models are proposed to learn consistency functions to map from noise to data directly, which allows one-step fast data generation and multistep sampling to improve sample quality. In this paper, we study the convergence of… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  11. arXiv:2505.03075  [pdf, other

    cs.IR

    Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models

    Authors: Zhengliang Shi, Lingyong Yan, Weiwei Sun, Yue Feng, Pengjie Ren, Xinyu Ma, Shuaiqiang Wang, Dawei Yin, Maarten de Rijke, Zhaochun Ren

    Abstract: Retrieval-augmented generation (RAG) integrates large language models ( LLM s) with retrievers to access external knowledge, improving the factuality of LLM generation in knowledge-grounded tasks. To optimize the RAG performance, most previous work independently fine-tunes the retriever to adapt to frozen LLM s or trains the LLMs to use documents retrieved by off-the-shelf retrievers, lacking end-… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  12. arXiv:2504.21601  [pdf, other

    math.GT cs.CG cs.DM cs.DS math.CO

    Efficient Decomposition of Forman-Ricci Curvature on Vietoris-Rips Complexes and Data Applications

    Authors: Danillo Barros de Souza, Jonatas Teodomiro, Fernando A. N. Santos, Mengjun Ding, Weiqiang Sun, Mathieu Desroches, Jürgen Jost, Serafim Rodrigues

    Abstract: Discrete Forman-Ricci curvature (FRC) is an efficient tool that characterizes essential geometrical features and associated transitions of real-world networks, extending seamlessly to higher-dimensional computations in simplicial complexes. In this article, we provide two major advancements: First, we give a decomposition for FRC, enabling local computations of FRC. Second, we construct a set-theo… ▽ More

    Submitted 5 May, 2025; v1 submitted 30 April, 2025; originally announced April 2025.

    MSC Class: 05C85; 52C99; 90C35; 62R40; 68W99; 68T09

  13. arXiv:2504.21308  [pdf, other

    cs.CV

    AGHI-QA: A Subjective-Aligned Dataset and Metric for AI-Generated Human Images

    Authors: Yunhao Li, Sijing Wu, Wei Sun, Zhichao Zhang, Yucheng Zhu, Zicheng Zhang, Huiyu Duan, Xiongkuo Min, Guangtao Zhai

    Abstract: The rapid development of text-to-image (T2I) generation approaches has attracted extensive interest in evaluating the quality of generated images, leading to the development of various quality assessment methods for general-purpose T2I outputs. However, existing image quality assessment (IQA) methods are limited to providing global quality scores, failing to deliver fine-grained perceptual evaluat… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  14. arXiv:2504.20187  [pdf, other

    cs.LG cs.AI eess.SY

    AI Recommendation Systems for Lane-Changing Using Adherence-Aware Reinforcement Learning

    Authors: Weihao Sun, Heeseung Bang, Andreas A. Malikopoulos

    Abstract: In this paper, we present an adherence-aware reinforcement learning (RL) approach aimed at seeking optimal lane-changing recommendations within a semi-autonomous driving environment to enhance a single vehicle's travel efficiency. The problem is framed within a Markov decision process setting and is addressed through an adherence-aware deep Q network, which takes into account the partial complianc… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 6 pages, 5 figures, conference

  15. arXiv:2504.18425  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.MM cs.SD

    Kimi-Audio Technical Report

    Authors: KimiTeam, Ding Ding, Zeqian Ju, Yichong Leng, Songxiang Liu, Tong Liu, Zeyu Shang, Kai Shen, Wei Song, Xu Tan, Heyi Tang, Zhengtao Wang, Chu Wei, Yifei Xin, Xinran Xu, Jianwei Yu, Yutao Zhang, Xinyu Zhou, Y. Charles, Jun Chen, Yanru Chen, Yulun Du, Weiran He, Zhenxing Hu, Guokun Lai , et al. (15 additional authors not shown)

    Abstract: We present Kimi-Audio, an open-source audio foundation model that excels in audio understanding, generation, and conversation. We detail the practices in building Kimi-Audio, including model architecture, data curation, training recipe, inference deployment, and evaluation. Specifically, we leverage a 12.5Hz audio tokenizer, design a novel LLM-based architecture with continuous features as input a… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  16. arXiv:2504.17519  [pdf, other

    cs.IR

    Replication and Exploration of Generative Retrieval over Dynamic Corpora

    Authors: Zhen Zhang, Xinyu Ma, Weiwei Sun, Pengjie Ren, Zhumin Chen, Shuaiqiang Wang, Dawei Yin, Maarten de Rijke, Zhaochun Ren

    Abstract: Generative retrieval (GR) has emerged as a promising paradigm in information retrieval (IR). However, most existing GR models are developed and evaluated using a static document collection, and their performance in dynamic corpora where document collections evolve continuously is rarely studied. In this paper, we first reproduce and systematically evaluate various representative GR approaches over… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Accepted at SIGIR 2025 (Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval)

  17. arXiv:2504.16713  [pdf, ps, other

    math.NA cs.CE

    Mixing Data-Driven and Physics-Based Constitutive Models using Uncertainty-Driven Phase Fields

    Authors: J. Storm, W. Sun, I. B. C. M. Rocha, F. P. van der Meer

    Abstract: There is a high interest in accelerating multiscale models using data-driven surrogate modeling techniques. Creating a large training dataset encompassing all relevant load scenarios is essential for a good surrogate, yet the computational cost of producing this data quickly becomes a limiting factor. Commonly, a pre-trained surrogate is used throughout the computational domain. Here, we introduce… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  18. arXiv:2504.16468  [pdf, other

    quant-ph cs.ET

    HAQA: A Hardware-Guided and Fidelity-Aware Strategy for Efficient Qubit Mapping Optimization

    Authors: Wenjie Sun, Xiaoyu Li, Lianhui Yu, Zhigang Wang, Geng Chen, Desheng Zheng, Guowu Yang

    Abstract: Quantum algorithms rely on quantum computers for implementation, but the physical connectivity constraints of modern quantum processors impede the efficient realization of quantum algorithms. Qubit mapping, a critical technology for practical quantum computing applications, directly determines the execution efficiency and feasibility of algorithms on superconducting quantum processors. Existing ma… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

    Comments: 27 pages

  19. arXiv:2504.16405  [pdf, other

    cs.MM

    EEmo-Bench: A Benchmark for Multi-modal Large Language Models on Image Evoked Emotion Assessment

    Authors: Lancheng Gao, Ziheng Jia, Yunhao Zeng, Wei Sun, Yiming Zhang, Wei Zhou, Guangtao Zhai, Xiongkuo Min

    Abstract: The furnishing of multi-modal large language models (MLLMs) has led to the emergence of numerous benchmark studies, particularly those evaluating their perception and understanding capabilities. Among these, understanding image-evoked emotions aims to enhance MLLMs' empathy, with significant applications such as human-machine interaction and advertising recommendations. However, current evaluation… ▽ More

    Submitted 7 May, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

  20. arXiv:2504.15585  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

    Authors: Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Junyuan Mao, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Chengwei Liu, Yifan Zhang, Qiankun Li , et al. (57 additional authors not shown)

    Abstract: The remarkable success of Large Language Models (LLMs) has illuminated a promising pathway toward achieving Artificial General Intelligence for both academic and industrial communities, owing to their unprecedented performance across various applications. As LLMs continue to gain prominence in both research and commercial domains, their security and safety implications have become a growing concer… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  21. arXiv:2504.14664  [pdf, other

    cs.CV

    Frequency-domain Learning with Kernel Prior for Blind Image Deblurring

    Authors: Jixiang Sun, Fei Lei, Jiawei Zhang, Wenxiu Sun, Yujiu Yang

    Abstract: While achieving excellent results on various datasets, many deep learning methods for image deblurring suffer from limited generalization capabilities with out-of-domain data. This limitation is likely caused by their dependence on certain domain-specific datasets. To address this challenge, we argue that it is necessary to introduce the kernel prior into deep learning methods, as the kernel prior… ▽ More

    Submitted 20 April, 2025; originally announced April 2025.

  22. arXiv:2504.13131  [pdf, other

    eess.IV cs.AI cs.CV

    NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results

    Authors: Xin Li, Kun Yuan, Bingchen Li, Fengbin Guan, Yizhen Shao, Zihao Yu, Xijun Wang, Yiting Lu, Wei Luo, Suhang Yao, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Yabin Zhang, Ao-Xiang Zhang, Tianwu Zhi, Jianzhao Liu, Yang Li, Jingwen Xu, Yiting Liao, Yushen Zuo, Mingyang Wu, Renjie Li, Shengyun Zhong , et al. (88 additional authors not shown)

    Abstract: This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating re… ▽ More

    Submitted 17 April, 2025; originally announced April 2025.

    Comments: Challenge Report of NTIRE 2025; Methods from 18 Teams; Accepted by CVPR Workshop; 21 pages

  23. arXiv:2504.11893  [pdf, other

    cs.CV

    CAGS: Open-Vocabulary 3D Scene Understanding with Context-Aware Gaussian Splatting

    Authors: Wei Sun, Yanzhao Zhou, Jianbin Jiao, Yuan Li

    Abstract: Open-vocabulary 3D scene understanding is crucial for applications requiring natural language-driven spatial interpretation, such as robotics and augmented reality. While 3D Gaussian Splatting (3DGS) offers a powerful representation for scene reconstruction, integrating it with open-vocabulary frameworks reveals a key challenge: cross-view granularity inconsistency. This issue, stemming from 2D se… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  24. arXiv:2504.10433  [pdf, other

    cs.CV cs.RO

    MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model

    Authors: Jian Liu, Wei Sun, Hui Yang, Jin Zheng, Zichen Geng, Hossein Rahmani, Ajmal Mian

    Abstract: Object pose estimation is a core means for robots to understand and interact with their environment. For this task, monocular category-level methods are attractive as they require only a single RGB camera. However, current methods rely on shape priors or CAD models of the intra-class known objects. We propose a diffusion-based monocular category-level 9D object pose generation method, MonoDiff9D.… ▽ More

    Submitted 14 April, 2025; originally announced April 2025.

    Comments: Accepted by ICRA'25

  25. arXiv:2504.09627  [pdf, other

    cs.IR cs.AI

    Slow Thinking for Sequential Recommendation

    Authors: Junjie Zhang, Beichen Zhang, Wenqi Sun, Hongyu Lu, Wayne Xin Zhao, Yu Chen, Ji-Rong Wen

    Abstract: To develop effective sequential recommender systems, numerous methods have been proposed to model historical user behaviors. Despite the effectiveness, these methods share the same fast thinking paradigm. That is, for making recommendations, these methods typically encodes user historical interactions to obtain user representations and directly match these representations with candidate item repre… ▽ More

    Submitted 13 April, 2025; originally announced April 2025.

  26. arXiv:2504.09255  [pdf, other

    cs.CV

    FVQ: A Large-Scale Dataset and A LMM-based Method for Face Video Quality Assessment

    Authors: Sijing Wu, Yunhao Li, Ziwen Xu, Yixuan Gao, Huiyu Duan, Wei Sun, Guangtao Zhai

    Abstract: Face video quality assessment (FVQA) deserves to be explored in addition to general video quality assessment (VQA), as face videos are the primary content on social media platforms and human visual system (HVS) is particularly sensitive to human faces. However, FVQA is rarely explored due to the lack of large-scale FVQA datasets. To fill this gap, we present the first large-scale in-the-wild FVQA… ▽ More

    Submitted 12 April, 2025; originally announced April 2025.

  27. arXiv:2504.08766  [pdf, other

    cond-mat.soft cs.LG physics.comp-ph

    Towards scientific machine learning for granular material simulations -- challenges and opportunities

    Authors: Marc Fransen, Andreas Fürst, Deepak Tunuguntla, Daniel N. Wilke, Benedikt Alkin, Daniel Barreto, Johannes Brandstetter, Miguel Angel Cabrera, Xinyan Fan, Mengwu Guo, Bram Kieskamp, Krishna Kumar, John Morrissey, Jonathan Nuttall, Jin Ooi, Luisa Orozco, Stefanos-Aldo Papanicolopulos, Tongming Qu, Dingena Schott, Takayuki Shuku, WaiChing Sun, Thomas Weinhart, Dongwei Ye, Hongyang Cheng

    Abstract: Micro-scale mechanisms, such as inter-particle and particle-fluid interactions, govern the behaviour of granular systems. While particle-scale simulations provide detailed insights into these interactions, their computational cost is often prohibitive. Attended by researchers from both the granular materials (GM) and machine learning (ML) communities, a recent Lorentz Center Workshop on "Machine L… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: 35 pages, 17 figures

  28. arXiv:2504.08180  [pdf, other

    cs.SE

    A Vulnerability Code Intent Summary Dataset

    Authors: Yifan Huang, Weisong Sun, Yubin Qu

    Abstract: In the era of Large Language Models (LLMs), the code summarization technique boosts a lot, along with the emergence of many new significant works. However, the potential of code summarization in the Computer Security Area still remains explored. Can we generate a code summary of a code snippet for its security intention? Thus, this work proposes an innovative large-scale multi-perspective Code Int… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  29. arXiv:2504.07439  [pdf, other

    cs.IR cs.CL

    LLM4Ranking: An Easy-to-use Framework of Utilizing Large Language Models for Document Reranking

    Authors: Qi Liu, Haozhe Duan, Yiqun Chen, Quanfeng Lu, Weiwei Sun, Jiaxin Mao

    Abstract: Utilizing large language models (LLMs) for document reranking has been a popular and promising research direction in recent years, many studies are dedicated to improving the performance and efficiency of using LLMs for reranking. Besides, it can also be applied in many real-world applications, such as search engines or retrieval-augmented generation. In response to the growing demand for research… ▽ More

    Submitted 10 April, 2025; originally announced April 2025.

  30. arXiv:2504.07002  [pdf, ps, other

    cs.CR cs.SE

    DeCoMa: Detecting and Purifying Code Dataset Watermarks through Dual Channel Code Abstraction

    Authors: Yuan Xiao, Yuchen Chen, Shiqing Ma, Haocheng Huang, Chunrong Fang, Yanwei Chen, Weisong Sun, Yunfeng Zhu, Xiaofang Zhang, Zhenyu Chen

    Abstract: Watermarking is a technique to help identify the source of data points, which can be used to help prevent the misuse of protected datasets. Existing methods on code watermarking, leveraging the idea from the backdoor research, embed stealthy triggers as watermarks.Despite their high resilience against dilution attacks and backdoor detections, the robustness has not been fully evaluated. To fill th… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: Accepted to ISSTA 2025. Code is available at https://github.com/xiaoyuanpigo/DeCoMa

  31. arXiv:2504.04310  [pdf, other

    cs.CL cs.AI

    CO-Bench: Benchmarking Language Model Agents in Algorithm Search for Combinatorial Optimization

    Authors: Weiwei Sun, Shengyu Feng, Shanda Li, Yiming Yang

    Abstract: Although LLM-based agents have attracted significant attention in domains such as software engineering and machine learning research, their role in advancing combinatorial optimization (CO) remains relatively underexplored. This gap underscores the need for a deeper understanding of their potential in tackling structured, constraint-intensive problems-a pursuit currently limited by the absence of… ▽ More

    Submitted 5 April, 2025; originally announced April 2025.

  32. arXiv:2504.00609  [pdf, other

    cs.CV cs.LG

    Bi-Grid Reconstruction for Image Anomaly Detection

    Authors: Huichuan Huang, Zhiqing Zhong, Guangyu Wei, Yonghao Wan, Wenlong Sun, Aimin Feng

    Abstract: In image anomaly detection, significant advancements have been made using un- and self-supervised methods with datasets containing only normal samples. However, these approaches often struggle with fine-grained anomalies. This paper introduces \textbf{GRAD}: Bi-\textbf{G}rid \textbf{R}econstruction for Image \textbf{A}nomaly \textbf{D}etection, which employs two continuous grids to enhance anomaly… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

  33. arXiv:2503.23835  [pdf, other

    cs.RO

    Disambiguate Gripper State in Grasp-Based Tasks: Pseudo-Tactile as Feedback Enables Pure Simulation Learning

    Authors: Yifei Yang, Lu Chen, Zherui Song, Yenan Chen, Wentao Sun, Zhongxiang Zhou, Rong Xiong, Yue Wang

    Abstract: Grasp-based manipulation tasks are fundamental to robots interacting with their environments, yet gripper state ambiguity significantly reduces the robustness of imitation learning policies for these tasks. Data-driven solutions face the challenge of high real-world data costs, while simulation data, despite its low costs, is limited by the sim-to-real gap. We identify the root cause of gripper st… ▽ More

    Submitted 31 March, 2025; originally announced March 2025.

    Comments: 8 pages, 5 figures, submitted to IROS 2025, project page: https://yifei-y.github.io/project-pages/Pseudo-Tactile-Feedback/

  34. arXiv:2503.23429  [pdf, other

    cs.RO

    A Visual-Inertial Motion Prior SLAM for Dynamic Environments

    Authors: Weilong Sun, Yumin Zhang, Boren Wei

    Abstract: The Visual-Inertial Simultaneous Localization and Mapping (VI-SLAM) algorithms which are mostly based on static assumption are widely used in fields such as robotics, UAVs, VR, and autonomous driving. To overcome the localization risks caused by dynamic landmarks in most VI-SLAM systems, a robust visual-inertial motion prior SLAM system, named IDY-VINS, is proposed in this paper which effectively… ▽ More

    Submitted 13 April, 2025; v1 submitted 30 March, 2025; originally announced March 2025.

  35. arXiv:2503.22727  [pdf, other

    cs.CL cs.LG

    A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI

    Authors: Alejandro Lozano, Min Woo Sun, James Burgess, Jeffrey J. Nirschl, Christopher Polzak, Yuhui Zhang, Liangyu Chen, Jeffrey Gu, Ivan Lopez, Josiah Aklilu, Anita Rau, Austin Wolfgang Katzer, Collin Chiu, Orr Zohar, Xiaohan Wang, Alfred Seunghoon Song, Chiang Chia-Chun, Robert Tibshirani, Serena Yeung-Levy

    Abstract: Despite the excitement behind biomedical artificial intelligence (AI), access to high-quality, diverse, and large-scale data - the foundation for modern AI systems - is still a bottleneck to unlocking its full potential. To address this gap, we introduce Biomedica, an open-source dataset derived from the PubMed Central Open Access subset, containing over 6 million scientific articles and 24 millio… ▽ More

    Submitted 1 April, 2025; v1 submitted 26 March, 2025; originally announced March 2025.

  36. arXiv:2503.22480  [pdf, other

    cs.LG

    Probabilistic Uncertain Reward Model

    Authors: Wangtao Sun, Xiang Cheng, Xing Yu, Haotian Xu, Zhao Yang, Shizhu He, Jun Zhao, Kang Liu

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has emerged as a critical technique for training large language models. However, reward hacking-a phenomenon where models exploit flaws in the reward model-remains a significant barrier to achieving robust and scalable intelligence through long-term training. Existing studies have proposed the uncertain reward models to address reward hacking, howe… ▽ More

    Submitted 8 May, 2025; v1 submitted 28 March, 2025; originally announced March 2025.

  37. arXiv:2503.21614  [pdf, other

    cs.CL

    A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond

    Authors: Xiaoye Qu, Yafu Li, Zhaochen Su, Weigao Sun, Jianhao Yan, Dongrui Liu, Ganqu Cui, Daizong Liu, Shuxian Liang, Junxian He, Peng Li, Wei Wei, Jing Shao, Chaochao Lu, Yue Zhang, Xian-Sheng Hua, Bowen Zhou, Yu Cheng

    Abstract: Recent Large Reasoning Models (LRMs), such as DeepSeek-R1 and OpenAI o1, have demonstrated strong performance gains by scaling up the length of Chain-of-Thought (CoT) reasoning during inference. However, a growing concern lies in their tendency to produce excessively long reasoning traces, which are often filled with redundant content (e.g., repeated definitions), over-analysis of simple problems,… ▽ More

    Submitted 27 March, 2025; originally announced March 2025.

    Comments: Survey, 32 pages, Large Reasoning Models, Efficient Reasoning for Language, Multimodality, and Beyond

  38. arXiv:2503.21122  [pdf, other

    cs.CV

    One Snapshot is All You Need: A Generalized Method for mmWave Signal Generation

    Authors: Teng Huang, Han Ding, Wenxin Sun, Cui Zhao, Ge Wang, Fei Wang, Kun Zhao, Zhi Wang, Wei Xi

    Abstract: Wireless sensing systems, particularly those using mmWave technology, offer distinct advantages over traditional vision-based approaches, such as enhanced privacy and effectiveness in poor lighting conditions. These systems, leveraging FMCW signals, have shown success in human-centric applications like localization, gesture recognition, and so on. However, comprehensive mmWave datasets for diverse… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

    Comments: IEEE INFOCOM 2025

  39. arXiv:2503.20536  [pdf, other

    cs.SE

    Knowledge-Based Multi-Agent Framework for Automated Software Architecture Design

    Authors: Yiran Zhang, Ruiyin Li, Peng Liang, Weisong Sun, Yang Liu

    Abstract: Architecture design is a critical step in software development. However, creating a high-quality architecture is often costly due to the significant need for human expertise and manual effort. Recently, agents built upon Large Language Models (LLMs) have achieved remarkable success in various software engineering tasks. Despite this progress, the use of agents to automate the architecture design p… ▽ More

    Submitted 26 March, 2025; originally announced March 2025.

  40. arXiv:2503.13162  [pdf, other

    cs.LG cs.AI

    Efficient Imitation under Misspecification

    Authors: Nicolas Espinosa-Dice, Sanjiban Choudhury, Wen Sun, Gokul Swamy

    Abstract: We consider the problem of imitation learning under misspecification: settings where the learner is fundamentally unable to replicate expert behavior everywhere. This is often true in practice due to differences in observation space and action space expressiveness (e.g. perceptual or morphological differences between robots and humans). Given the learner must make some mistakes in the misspecified… ▽ More

    Submitted 2 April, 2025; v1 submitted 17 March, 2025; originally announced March 2025.

    Comments: 38 pages, 6 figures. Published as a conference paper at ICLR 2025

  41. arXiv:2503.12098   

    cs.LG

    Eval-PPO: Building an Efficient Threat Evaluator Using Proximal Policy Optimization

    Authors: Wuzhou Sun, Siyi Li, Qingxiang Zou, Zixing Liao

    Abstract: In various game scenarios, selecting a fixed number of targets from multiple enemy units is an extremely challenging task. This difficulty stems from the complex relationship between the threat levels of enemy units and their feature characteristics, which complicates the design of rule-based evaluators. Moreover, traditional supervised learning methods face the challenge of lacking explicit label… ▽ More

    Submitted 25 April, 2025; v1 submitted 15 March, 2025; originally announced March 2025.

    Comments: The research content is not yet complete and requires further supplementation and improvement

  42. arXiv:2503.11251  [pdf, other

    cs.CV cs.CL

    Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model

    Authors: Haoyang Huang, Guoqing Ma, Nan Duan, Xing Chen, Changyi Wan, Ranchen Ming, Tianyu Wang, Bo Wang, Zhiying Lu, Aojie Li, Xianfang Zeng, Xinhao Zhang, Gang Yu, Yuhe Yin, Qiling Wu, Wen Sun, Kang An, Xin Han, Deshan Sun, Wei Ji, Bizhu Huang, Brian Li, Chenfei Wu, Guanzhe Huang, Huixin Xiong , et al. (29 additional authors not shown)

    Abstract: We present Step-Video-TI2V, a state-of-the-art text-driven image-to-video generation model with 30B parameters, capable of generating videos up to 102 frames based on both text and image inputs. We build Step-Video-TI2V-Eval as a new benchmark for the text-driven image-to-video task and compare Step-Video-TI2V with open-source and commercial TI2V engines using this dataset. Experimental results de… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

    Comments: 7 pages

  43. arXiv:2503.10737  [pdf, other

    cs.SE cs.AI

    Commenting Higher-level Code Unit: Full Code, Reduced Code, or Hierarchical Code Summarization

    Authors: Weisong Sun, Yiran Zhang, Jie Zhu, Zhihui Wang, Chunrong Fang, Yonglong Zhang, Yebo Feng, Jiangping Huang, Xingya Wang, Zhi Jin, Yang Liu

    Abstract: Commenting code is a crucial activity in software development, as it aids in facilitating future maintenance and updates. To enhance the efficiency of writing comments and reduce developers' workload, researchers has proposed various automated code summarization (ACS) techniques to automatically generate comments/summaries for given code units. However, these ACS techniques primarily focus on gene… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    MSC Class: 68-04 ACM Class: D.2.3; I.2.7

  44. arXiv:2503.10403  [pdf, other

    cs.CV

    Hyper3D: Efficient 3D Representation via Hybrid Triplane and Octree Feature for Enhanced 3D Shape Variational Auto-Encoders

    Authors: Jingyu Guo, Sensen Gao, Jia-Wang Bian, Wanhu Sun, Heliang Zheng, Rongfei Jia, Mingming Gong

    Abstract: Recent 3D content generation pipelines often leverage Variational Autoencoders (VAEs) to encode shapes into compact latent representations, facilitating diffusion-based generation. Efficiently compressing 3D shapes while preserving intricate geometric details remains a key challenge. Existing 3D shape VAEs often employ uniform point sampling and 1D/2D latent representations, such as vector sets or… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  45. arXiv:2503.10096  [pdf, other

    cs.CV

    Semantic Latent Motion for Portrait Video Generation

    Authors: Qiyuan Zhang, Chenyu Wu, Wenzhang Sun, Huaize Liu, Donglin Di, Wei Chen, Changqing Zou

    Abstract: Recent advancements in portrait video generation have been noteworthy. However, existing methods rely heavily on human priors and pre-trained generation models, which may introduce unrealistic motion and lead to inefficient inference. To address these challenges, we propose Semantic Latent Motion (SeMo), a compact and expressive motion representation. Leveraging this representation, our approach a… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  46. arXiv:2503.08005  [pdf, other

    cs.CV

    CDI3D: Cross-guided Dense-view Interpolation for 3D Reconstruction

    Authors: Zhiyuan Wu, Xibin Song, Senbo Wang, Weizhe Liu, Jiayu Yang, Ziang Cheng, Shenzhou Chen, Taizhang Shang, Weixuan Sun, Shan Luo, Pan Ji

    Abstract: 3D object reconstruction from single-view image is a fundamental task in computer vision with wide-ranging applications. Recent advancements in Large Reconstruction Models (LRMs) have shown great promise in leveraging multi-view images generated by 2D diffusion models to extract 3D content. However, challenges remain as 2D diffusion models often struggle to produce dense images with strong multi-v… ▽ More

    Submitted 11 March, 2025; v1 submitted 10 March, 2025; originally announced March 2025.

  47. arXiv:2503.06660  [pdf, other

    cs.CV

    AxisPose: Model-Free Matching-Free Single-Shot 6D Object Pose Estimation via Axis Generation

    Authors: Yang Zou, Zhaoshuai Qi, Yating Liu, Zihao Xu, Weipeng Sun, Weiyi Liu, Xingyuan Li, Jiaqi Yang, Yanning Zhang

    Abstract: Object pose estimation, which plays a vital role in robotics, augmented reality, and autonomous driving, has been of great interest in computer vision. Existing studies either require multi-stage pose regression or rely on 2D-3D feature matching. Though these approaches have shown promising results, they rely heavily on appearance information, requiring complex input (i.e., multi-view reference in… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

    MSC Class: 68T45 ACM Class: I.4.3

  48. arXiv:2503.06659  [pdf, other

    cs.HC

    PANDA: Parkinson's Assistance and Notification Driving Aid

    Authors: Tianyang Wen, Xucheng Zhang, Zhirong Wan, Jing Zhao, Yicheng Zhu, Ning Su, Xiaolan Peng, Jin Huang, Wei Sun, Feng Tian, Franklin Mingzhe Li

    Abstract: Parkinson's Disease (PD) significantly impacts driving abilities, often leading to early driving cessation or accidents due to reduced motor control and increasing reaction times. To diminish the impact of these symptoms, we developed PANDA (Parkinson's Assistance and Notification Driving Aid), a multi-modality real-time alert system designed to monitor driving patterns continuously and provide im… ▽ More

    Submitted 9 March, 2025; originally announced March 2025.

  49. arXiv:2503.05578  [pdf, other

    cs.CV cs.RO

    Novel Object 6D Pose Estimation with a Single Reference View

    Authors: Jian Liu, Wei Sun, Kai Zeng, Jin Zheng, Hui Yang, Lin Wang, Hossein Rahmani, Ajmal Mian

    Abstract: Existing novel object 6D pose estimation methods typically rely on CAD models or dense reference views, which are both difficult to acquire. Using only a single reference view is more scalable, but challenging due to large pose discrepancies and limited geometric and spatial information. To address these issues, we propose a Single-Reference-based novel object 6D (SinRef-6D) pose estimation method… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

    Comments: 17 pages, 12 figures (including supplementary material)

  50. arXiv:2503.05447  [pdf, other

    cs.LG cs.AI cs.CL cs.DC

    Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts

    Authors: Weigao Sun, Disen Lan, Tong Zhu, Xiaoye Qu, Yu Cheng

    Abstract: Linear Sequence Modeling (LSM) like linear attention, state space models and linear RNNs, and Mixture-of-Experts (MoE) have recently emerged as significant architectural improvements. In this paper, we introduce Linear-MoE, a production-level system for modeling and training large-scale models that integrate LSM with MoE. Linear-MoE leverages the advantages of both LSM modules for linear-complexit… ▽ More

    Submitted 15 April, 2025; v1 submitted 7 March, 2025; originally announced March 2025.

    Comments: Technical report, 17 pages