Skip to main content

Showing 1–50 of 2,956 results for author: Wu, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10222  [pdf, ps, other

    cs.LG cs.CL

    ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention

    Authors: Jintian Shao, Hongyi Huang, Jiayi Wu, Beiwen Zhang, ZhiYu Wu, You Shan, MingKai Zheng

    Abstract: Transformer models rely on self-attention to capture token dependencies but face challenges in effectively integrating positional information while allowing multi-head attention (MHA) flexibility. Prior methods often model semantic and positional differences disparately or apply uniform positional adjustments across heads, potentially limiting representational capacity. This paper introduces Compl… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.10202  [pdf, other

    cs.CL

    VQ-Logits: Compressing the Output Bottleneck of Large Language Models via Vector Quantized Logits

    Authors: Jintian Shao, Hongyi Huang, Jiayi Wu, YiMing Cheng, ZhiYu Wu, You Shan, MingKai Zheng

    Abstract: Large Language Models (LLMs) have achieved remarkable success but face significant computational and memory challenges, particularly due to their extensive output vocabularies. The final linear projection layer, mapping hidden states to vocabulary-sized logits, often constitutes a substantial portion of the model's parameters and computational cost during inference. Existing methods like adaptive… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  3. arXiv:2505.10117  [pdf, other

    cs.LG cs.CL

    Learning Virtual Machine Scheduling in Cloud Computing through Language Agents

    Authors: JieHao Wu, Ziwei Wang, Junjie Sheng, Wenhao Li, Xiangfei Wang, Jun Luo

    Abstract: In cloud services, virtual machine (VM) scheduling is a typical Online Dynamic Multidimensional Bin Packing (ODMBP) problem, characterized by large-scale complexity and fluctuating demands. Traditional optimization methods struggle to adapt to real-time changes, domain-expert-designed heuristic approaches suffer from rigid strategies, and existing learning-based methods often lack generalizability… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  4. arXiv:2505.08704  [pdf, ps, other

    cs.AI cs.CL

    LLM-based Prompt Ensemble for Reliable Medical Entity Recognition from EHRs

    Authors: K M Sajjadul Islam, Ayesha Siddika Nipu, Jiawei Wu, Praveen Madiraju

    Abstract: Electronic Health Records (EHRs) are digital records of patient information, often containing unstructured clinical text. Named Entity Recognition (NER) is essential in EHRs for extracting key medical entities like problems, tests, and treatments to support downstream clinical applications. This paper explores prompt-based medical entity recognition using large language models (LLMs), specifically… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: IEEE 26th International Conference on Information Reuse and Integration for Data Science (IRI 2025), San Jose, CA, USA

  5. arXiv:2505.08581  [pdf, other

    cs.CV eess.IV q-bio.TO

    ReSurgSAM2: Referring Segment Anything in Surgical Video via Credible Long-term Tracking

    Authors: Haofeng Liu, Mingqi Gao, Xuxiao Luo, Ziyue Wang, Guanyi Qin, Junde Wu, Yueming Jin

    Abstract: Surgical scene segmentation is critical in computer-assisted surgery and is vital for enhancing surgical quality and patient outcomes. Recently, referring surgical segmentation is emerging, given its advantage of providing surgeons with an interactive experience to segment the target object. However, existing methods are limited by low efficiency and short-term tracking, hindering their applicabil… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

    Comments: Early accepted by MICCAI 2025

  6. arXiv:2505.08235  [pdf, other

    cs.CV

    EventDiff: A Unified and Efficient Diffusion Model Framework for Event-based Video Frame Interpolation

    Authors: Hanle Zheng, Xujie Han, Zegang Peng, Shangbin Zhang, Guangxun Du, Zhuo Zou, Xilin Wang, Jibin Wu, Hao Guo, Lei Deng

    Abstract: Video Frame Interpolation (VFI) is a fundamental yet challenging task in computer vision, particularly under conditions involving large motion, occlusion, and lighting variation. Recent advancements in event cameras have opened up new opportunities for addressing these challenges. While existing event-based VFI methods have succeeded in recovering large and complex motions by leveraging handcrafte… ▽ More

    Submitted 13 May, 2025; originally announced May 2025.

  7. arXiv:2505.08168  [pdf, ps, other

    cs.CL cs.AI

    Exploiting Text Semantics for Few and Zero Shot Node Classification on Text-attributed Graph

    Authors: Yuxiang Wang, Xiao Yan, Shiyu Jin, Quanqing Xu, Chuang Hu, Yuanyuan Zhu, Bo Du, Jia Wu, Jiawei Jiang

    Abstract: Text-attributed graph (TAG) provides a text description for each graph node, and few- and zero-shot node classification on TAGs have many applications in fields such as academia and social networks. Existing work utilizes various graph-based augmentation techniques to train the node and text embeddings, while text-based augmentations are largely unexplored. In this paper, we propose Text Semantics… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  8. arXiv:2505.07818  [pdf, other

    cs.CV

    DanceGRPO: Unleashing GRPO on Visual Generation

    Authors: Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, Ping Luo

    Abstract: Recent breakthroughs in generative models-particularly diffusion models and rectified flows-have revolutionized visual content creation, yet aligning model outputs with human preferences remains a critical challenge. Existing reinforcement learning (RL)-based methods for visual generation face critical limitations: incompatibility with modern Ordinary Differential Equations (ODEs)-based sampling p… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Project Page: https://dancegrpo.github.io/

  9. arXiv:2505.07062  [pdf, ps, other

    cs.CV cs.AI

    Seed1.5-VL Technical Report

    Authors: Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, Jingji Chen, Jingjia Huang, Kang Lei, Liping Yuan, Lishu Luo, Pengfei Liu, Qinghao Ye, Rui Qian, Shen Yan, Shixiong Zhao, Shuai Peng, Shuangye Li, Sihang Yuan, Sijin Wu, Tianheng Cheng , et al. (172 additional authors not shown)

    Abstract: We present Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning. Seed1.5-VL is composed with a 532M-parameter vision encoder and a Mixture-of-Experts (MoE) LLM of 20B active parameters. Despite its relatively compact architecture, it delivers strong performance across a wide spectrum of public VLM benchmarks and internal evaluati… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  10. arXiv:2505.06803  [pdf, other

    cs.SD cs.CL cs.CV cs.MM eess.AS

    Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation

    Authors: Xilin Jiang, Junkai Wu, Vishal Choudhari, Nima Mesgarani

    Abstract: Audio large language models (LLMs) are considered experts at recognizing sound objects, yet their performance relative to LLMs in other sensory modalities, such as visual or audio-visual LLMs, and to humans using their ears, eyes, or both remains unexplored. To investigate this, we systematically evaluate audio, visual, and audio-visual LLMs, specifically Qwen2-Audio, Qwen2-VL, and Qwen2.5-Omni, a… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  11. arXiv:2505.06269  [pdf

    cs.LG

    A machine learning model for skillful climate system prediction

    Authors: Chenguang Zhou, Lei Chen, Xiaohui Zhong, Bo Lu, Hao Li, Libo Wu, Jie Wu, Jiahui Hu, Zesheng Dou, Pang-Chi Hsu, Xiaoye Zhang

    Abstract: Climate system models (CSMs), through integrating cross-sphere interactions among the atmosphere, ocean, land, and cryosphere, have emerged as pivotal tools for deciphering climate dynamics and improving forecasting capabilities. Recent breakthroughs in artificial intelligence (AI)-driven meteorological modeling have demonstrated remarkable success in single-sphere systems and partially spheres co… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  12. arXiv:2505.06227  [pdf, ps, other

    cs.GR cs.CV

    Anymate: A Dataset and Baselines for Learning 3D Object Rigging

    Authors: Yufan Deng, Yuhao Zhang, Chen Geng, Shangzhe Wu, Jiajun Wu

    Abstract: Rigging and skinning are essential steps to create realistic 3D animations, often requiring significant expertise and manual effort. Traditional attempts at automating these processes rely heavily on geometric heuristics and often struggle with objects of complex geometry. Recent data-driven approaches show potential for better generality, but are often constrained by limited training data. We pre… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: SIGGRAPH 2025. Project page: https://anymate3d.github.io/

  13. arXiv:2505.06191  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.LG cs.RO

    Neuro-Symbolic Concepts

    Authors: Jiayuan Mao, Joshua B. Tenenbaum, Jiajun Wu

    Abstract: This article presents a concept-centric paradigm for building agents that can learn continually and reason flexibly. The concept-centric agent utilizes a vocabulary of neuro-symbolic concepts. These concepts, such as object, relation, and action concepts, are grounded on sensory inputs and actuation outputs. They are also compositional, allowing for the creation of novel concepts through their str… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: To appear in Communications of the ACM

  14. arXiv:2505.05834  [pdf, other

    cs.CV

    Dual-level Fuzzy Learning with Patch Guidance for Image Ordinal Regression

    Authors: Chunlai Dong, Haochao Ying, Qibo Qiu, Jinhong Wang, Danny Chen, Jian Wu

    Abstract: Ordinal regression bridges regression and classification by assigning objects to ordered classes. While human experts rely on discriminative patch-level features for decisions, current approaches are limited by the availability of only image-level ordinal labels, overlooking fine-grained patch-level characteristics. In this paper, we propose a Dual-level Fuzzy Learning with Patch Guidance framewor… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

    Comments: Accepted by IJCAI 2025

  15. arXiv:2505.05477  [pdf

    eess.SP cs.CV

    ECGDeDRDNet: A deep learning-based method for Electrocardiogram noise removal using a double recurrent dense network

    Authors: Sainan xiao, Wangdong Yang, Buwen Cao, Jintao Wu

    Abstract: Electrocardiogram (ECG) signals are frequently corrupted by noise, such as baseline wander (BW), muscle artifacts (MA), and electrode motion (EM), which significantly degrade their diagnostic utility. To address this issue, we propose ECGDeDRDNet, a deep learning-based ECG Denoising framework leveraging a Double Recurrent Dense Network architecture. In contrast to traditional approaches, we introd… ▽ More

    Submitted 22 April, 2025; originally announced May 2025.

  16. arXiv:2505.05472  [pdf, other

    cs.CV

    Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation

    Authors: Chao Liao, Liyang Liu, Xun Wang, Zhengxiong Luo, Xinyu Zhang, Wenliang Zhao, Jie Wu, Liang Li, Zhi Tian, Weilin Huang

    Abstract: Recent progress in unified models for image understanding and generation has been impressive, yet most approaches remain limited to single-modal generation conditioned on multiple modalities. In this paper, we present Mogao, a unified framework that advances this paradigm by enabling interleaved multi-modal generation through a causal approach. Mogao integrates a set of key technical improvements… ▽ More

    Submitted 11 May, 2025; v1 submitted 8 May, 2025; originally announced May 2025.

    Comments: Mogao Technical Report

  17. arXiv:2505.04094  [pdf, other

    cs.CR cs.SE

    SolPhishHunter: Towards Detecting and Understanding Phishing on Solana

    Authors: Ziwei Li, Zigui Jiang, Ming Fang, Jiaxin Chen, Zhiying Wu, Jiajing Wu, Lun Zhang, Zibin Zheng

    Abstract: Solana is a rapidly evolving blockchain platform that has attracted an increasing number of users. However, this growth has also drawn the attention of malicious actors, with some phishers extending their reach into the Solana ecosystem. Unlike platforms such as Ethereum, Solana has distinct designs of accounts and transactions, leading to the emergence of new types of phishing transactions that w… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  18. arXiv:2505.03793  [pdf, other

    cs.LG cs.AI

    LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection

    Authors: Xinyue Zeng, Haohui Wang, Junhong Lin, Jun Wu, Tyler Cody, Dawei Zhou

    Abstract: The proliferation of open-sourced Large Language Models (LLMs) and diverse downstream tasks necessitates efficient model selection, given the impracticality of fine-tuning all candidates due to computational constraints. Despite the recent advances in LLM selection, a fundamental research question largely remains nascent: how can we model the dynamic behaviors of LLMs during fine-tuning, thereby e… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: It is accepted by ICML'2025, and the code is open-sourcing on https://github.com/Susan571/LENSLLM.git

  19. arXiv:2505.03603  [pdf, other

    cs.CV cs.MM

    PAHA: Parts-Aware Audio-Driven Human Animation with Diffusion Model

    Authors: S. Z. Zhou, Y. B. Wang, J. F. Wu, T. Hu, J. N. Zhang, Z. J. Li, Y. Liu

    Abstract: Audio-driven human animation technology is widely used in human-computer interaction, and the emergence of diffusion models has further advanced its development. Currently, most methods rely on multi-stage generation and intermediate representations, resulting in long inference time and issues with generation quality in specific foreground regions and audio-motion consistency. These shortcomings a… ▽ More

    Submitted 11 May, 2025; v1 submitted 6 May, 2025; originally announced May 2025.

  20. arXiv:2505.03448  [pdf, other

    cs.RO

    AquaticVision: Benchmarking Visual SLAM in Underwater Environment with Events and Frames

    Authors: Yifan Peng, Yuze Hong, Ziyang Hong, Apple Pui-Yi Chui, Junfeng Wu

    Abstract: Many underwater applications, such as offshore asset inspections, rely on visual inspection and detailed 3D reconstruction. Recent advancements in underwater visual SLAM systems for aquatic environments have garnered significant attention in marine robotics research. However, existing underwater visual SLAM datasets often lack groundtruth trajectory data, making it difficult to objectively compare… ▽ More

    Submitted 6 May, 2025; originally announced May 2025.

  21. arXiv:2505.02833  [pdf, other

    cs.RO cs.CV cs.LG

    TWIST: Teleoperated Whole-Body Imitation System

    Authors: Yanjie Ze, Zixuan Chen, João Pedro Araújo, Zi-ang Cao, Xue Bin Peng, Jiajun Wu, C. Karen Liu

    Abstract: Teleoperating humanoid robots in a whole-body manner marks a fundamental step toward developing general-purpose robotic intelligence, with human motion providing an ideal interface for controlling all degrees of freedom. Yet, most current humanoid teleoperation systems fall short of enabling coordinated whole-body behavior, typically limiting themselves to isolated locomotion or manipulation tasks… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: Project website: https://humanoid-teleop.github.io

  22. arXiv:2505.02629  [pdf, other

    cs.SE

    Parameter-Efficient Fine-Tuning with Attributed Patch Semantic Graph for Automated Patch Correctness Assessment

    Authors: Zhenyu Yang, Jingwen Wu, Zhen Yang, Zhongxing Yu

    Abstract: Automated program repair (APR) aims to automatically repair program errors without human intervention, and recent years have witnessed a growing interest on this research topic. While much progress has been made and techniques originating from different disciplines have been proposed, APR techniques generally suffer from the patch overfitting issue, i.e., the generated patches are not genuinely co… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

    Comments: 16 pages, 4 figures, 12 tables

  23. arXiv:2505.01880  [pdf, other

    cs.SD cs.CV cs.MM eess.AS

    Weakly-supervised Audio Temporal Forgery Localization via Progressive Audio-language Co-learning Network

    Authors: Junyan Wu, Wenbo Xu, Wei Lu, Xiangyang Luo, Rui Yang, Shize Guo

    Abstract: Audio temporal forgery localization (ATFL) aims to find the precise forgery regions of the partial spoof audio that is purposefully modified. Existing ATFL methods rely on training efficient networks using fine-grained annotations, which are obtained costly and challenging in real-world scenarios. To meet this challenge, in this paper, we propose a progressive audio-language co-learning network (L… ▽ More

    Submitted 7 May, 2025; v1 submitted 3 May, 2025; originally announced May 2025.

    Comments: 9pages, 5figures. This paper has been accepted for IJCAI2025

  24. arXiv:2505.01113  [pdf, other

    cs.RO cs.CV cs.NE

    NeuroLoc: Encoding Navigation Cells for 6-DOF Camera Localization

    Authors: Xun Li, Jian Yang, Fenli Jia, Muyu Wang, Qi Wu, Jun Wu, Jinpeng Mi, Jilin Hu, Peidong Liang, Xuan Tang, Ke Li, Xiong You, Xian Wei

    Abstract: Recently, camera localization has been widely adopted in autonomous robotic navigation due to its efficiency and convenience. However, autonomous navigation in unknown environments often suffers from scene ambiguity, environmental disturbances, and dynamic object transformation in camera localization. To address this problem, inspired by the biological brain navigation mechanism (such as grid cell… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

  25. arXiv:2505.00592  [pdf, other

    cs.CV cs.LG

    Uncertainty-Aware Multi-Expert Knowledge Distillation for Imbalanced Disease Grading

    Authors: Shuo Tong, Shangde Gao, Ke Liu, Zihang Huang, Hongxia Xu, Haochao Ying, Jian Wu

    Abstract: Automatic disease image grading is a significant application of artificial intelligence for healthcare, enabling faster and more accurate patient assessments. However, domain shifts, which are exacerbated by data imbalance, introduce bias into the model, posing deployment difficulties in clinical applications. To address the problem, we propose a novel \textbf{U}ncertainty-aware \textbf{M}ulti-exp… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

  26. arXiv:2505.00049  [pdf, other

    cs.CY cs.CL cs.HC cs.LG

    Humanizing LLMs: A Survey of Psychological Measurements with Tools, Datasets, and Human-Agent Applications

    Authors: Wenhan Dong, Yuemeng Zhao, Zhen Sun, Yule Liu, Zifan Peng, Jingyi Zheng, Zongmin Zhang, Ziyi Zhang, Jun Wu, Ruiming Wang, Shengmin Xu, Xinyi Huang, Xinlei He

    Abstract: As large language models (LLMs) are increasingly used in human-centered tasks, assessing their psychological traits is crucial for understanding their social impact and ensuring trustworthy AI alignment. While existing reviews have covered some aspects of related research, several important areas have not been systematically discussed, including detailed discussions of diverse psychological tests,… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

    Comments: 26 pages,7 figures

  27. arXiv:2505.00018  [pdf, ps, other

    cs.AI cs.HC cs.MA

    Position Paper: Towards Open Complex Human-AI Agents Collaboration System for Problem-Solving and Knowledge Management

    Authors: Ju Wu, Calvin K. L. Or

    Abstract: This position paper critically surveys a broad spectrum of recent empirical developments on human-AI agents collaboration, highlighting both their technical achievements and persistent gaps. We observe a lack of a unifying theoretical framework that can coherently integrate these varied studies, especially when tackling open-ended, complex tasks. To address this, we propose a novel conceptual arch… ▽ More

    Submitted 24 April, 2025; originally announced May 2025.

  28. arXiv:2504.21739  [pdf, other

    cs.CR

    Bilateral Differentially Private Vertical Federated Boosted Decision Trees

    Authors: Bokang Zhang, Zhikun Zhang, Haodong Jiang, Yang Liu, Lihao Zheng, Yuxiao Zhou, Shuaiting Huang, Junfeng Wu

    Abstract: Federated learning is a distributed machine learning paradigm that enables collaborative training across multiple parties while ensuring data privacy. Gradient Boosting Decision Trees (GBDT), such as XGBoost, have gained popularity due to their high performance and strong interpretability. Therefore, there has been a growing interest in adapting XGBoost for use in federated settings via cryptograp… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  29. arXiv:2504.21583  [pdf, other

    cs.NI

    Toward Realization of Low-Altitude Economy Networks: Core Architecture, Integrated Technologies, and Future Directions

    Authors: Yixian Wang, Geng Sun, Zemin Sun, Jiacheng Wang, Jiahui Li, Changyuan Zhao, Jing Wu, Shuang Liang, Minghao Yin, Pengfei Wang, Dusit Niyato, Sumei Sun, Dong In Kim

    Abstract: The rise of the low-altitude economy (LAE) is propelling urban development and emerging industries by integrating advanced technologies to enhance efficiency, safety, and sustainability in low-altitude operations. The widespread adoption of unmanned aerial vehicles (UAVs) and electric vertical takeoff and landing (eVTOL) aircraft plays a crucial role in enabling key applications within LAE, such a… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

    Comments: 25 pages, 12 figures, published to TCCN

  30. arXiv:2504.21228  [pdf, other

    cs.CR cs.AI

    CachePrune: Neural-Based Attribution Defense Against Indirect Prompt Injection Attacks

    Authors: Rui Wang, Junda Wu, Yu Xia, Tong Yu, Ruiyi Zhang, Ryan Rossi, Lina Yao, Julian McAuley

    Abstract: Large Language Models (LLMs) are identified as being susceptible to indirect prompt injection attack, where the model undesirably deviates from user-provided instructions by executing tasks injected in the prompt context. This vulnerability stems from LLMs' inability to distinguish between data and instructions within a prompt. In this paper, we propose CachePrune that defends against this attack… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  31. arXiv:2504.20869  [pdf, other

    cs.LG cs.AI cs.CR

    Quantifying the Noise of Structural Perturbations on Graph Adversarial Attacks

    Authors: Junyuan Fang, Han Yang, Haixian Wen, Jiajing Wu, Zibin Zheng, Chi K. Tse

    Abstract: Graph neural networks have been widely utilized to solve graph-related tasks because of their strong learning power in utilizing the local information of neighbors. However, recent studies on graph adversarial attacks have proven that current graph neural networks are not robust against malicious attacks. Yet much of the existing work has focused on the optimization objective based on attack perfo… ▽ More

    Submitted 29 April, 2025; v1 submitted 29 April, 2025; originally announced April 2025.

    Comments: Under Review

  32. arXiv:2504.20848  [pdf, other

    cs.LG cs.AI cs.CR

    Mitigating the Structural Bias in Graph Adversarial Defenses

    Authors: Junyuan Fang, Huimin Liu, Han Yang, Jiajing Wu, Zibin Zheng, Chi K. Tse

    Abstract: In recent years, graph neural networks (GNNs) have shown great potential in addressing various graph structure-related downstream tasks. However, recent studies have found that current GNNs are susceptible to malicious adversarial attacks. Given the inevitable presence of adversarial attacks in the real world, a variety of defense methods have been proposed to counter these attacks and enhance the… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: Under Review

  33. arXiv:2504.20830  [pdf, other

    cs.CV

    CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation

    Authors: Jianyu Wu, Yizhou Wang, Xiangyu Yue, Xinzhu Ma, Jingyang Guo, Dongzhan Zhou, Wanli Ouyang, Shixiang Tang

    Abstract: While accurate and user-friendly Computer-Aided Design (CAD) is crucial for industrial design and manufacturing, existing methods still struggle to achieve this due to their over-simplified representations or architectures incapable of supporting multimodal design requirements. In this paper, we attempt to tackle this problem from both methods and datasets aspects. First, we propose a cascade MAR… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  34. arXiv:2504.20378  [pdf, other

    cs.CV

    Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views

    Authors: Jiang Wu, Rui Li, Yu Zhu, Rong Guo, Jinqiu Sun, Yanning Zhang

    Abstract: We present a Gaussian Splatting method for surface reconstruction using sparse input views. Previous methods relying on dense views struggle with extremely sparse Structure-from-Motion points for initialization. While learning-based Multi-view Stereo (MVS) provides dense 3D points, directly combining it with Gaussian Splatting leads to suboptimal results due to the ill-posed nature of sparse-view… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: CVPR 2025

  35. arXiv:2504.20319  [pdf, other

    cs.LG

    Bayesian Experimental Design for Model Discrepancy Calibration: An Auto-Differentiable Ensemble Kalman Inversion Approach

    Authors: Huchen Yang, Xinghao Dong, Jin-Long Wu

    Abstract: Bayesian experimental design (BED) offers a principled framework for optimizing data acquisition by leveraging probabilistic inference. However, practical implementations of BED are often compromised by model discrepancy, i.e., the mismatch between predictive models and true physical systems, which can potentially lead to biased parameter estimates. While data-driven approaches have been recently… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

  36. arXiv:2504.20073  [pdf, other

    cs.LG cs.AI cs.CL

    RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning

    Authors: Zihan Wang, Kangrui Wang, Qineng Wang, Pingyue Zhang, Linjie Li, Zhengyuan Yang, Kefan Yu, Minh Nhat Nguyen, Licheng Liu, Eli Gottlieb, Monica Lam, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li

    Abstract: Training large language models (LLMs) as interactive agents presents unique challenges including long-horizon decision making and interacting with stochastic environment feedback. While reinforcement learning (RL) has enabled progress in static tasks, multi-turn agent RL training remains underexplored. We propose StarPO (State-Thinking-Actions-Reward Policy Optimization), a general framework for t… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

  37. arXiv:2504.19467  [pdf

    cs.CL cs.AI

    BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text

    Authors: Jiageng Wu, Bowen Gu, Ren Zhou, Kevin Xie, Doug Snyder, Yixing Jiang, Valentina Carducci, Richard Wyss, Rishi J Desai, Emily Alsentzer, Leo Anthony Celi, Adam Rodman, Sebastian Schneeweiss, Jonathan H. Chen, Santiago Romero-Brufau, Kueiyu Joshua Lin, Jie Yang

    Abstract: Large language models (LLMs) hold great promise for medical applications and are evolving rapidly, with new models being released at an accelerated pace. However, current evaluations of LLMs in clinical contexts remain limited. Most existing benchmarks rely on medical exam-style questions or PubMed-derived text, failing to capture the complexity of real-world electronic health record (EHR) data. O… ▽ More

    Submitted 30 April, 2025; v1 submitted 28 April, 2025; originally announced April 2025.

  38. arXiv:2504.19093  [pdf, other

    cs.CR cs.AI cs.PF

    CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges

    Authors: Yu Li, Qizhi Pei, Mengyuan Sun, Honglin Lin, Chenlin Ming, Xin Gao, Jiang Wu, Conghui He, Lijun Wu

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities, especially the recent advancements in reasoning, such as o1 and o3, pushing the boundaries of AI. Despite these impressive achievements in mathematics and coding, the reasoning abilities of LLMs in domains requiring cryptographic expertise remain underexplored. In this paper, we introduce CipherBank, a comprehensive benchmark… ▽ More

    Submitted 26 April, 2025; originally announced April 2025.

    Comments: Work in progress

  39. arXiv:2504.18428  [pdf, other

    cs.CL

    PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts

    Authors: Yiming Wang, Pei Zhang, Jialong Tang, Haoran Wei, Baosong Yang, Rui Wang, Chenshu Sun, Feitong Sun, Jiran Zhang, Junxuan Wu, Qiqian Cang, Yichang Zhang, Fei Huang, Junyang Lin, Fei Huang, Jingren Zhou

    Abstract: In this paper, we introduce PolyMath, a multilingual mathematical reasoning benchmark covering 18 languages and 4 easy-to-hard difficulty levels. Our benchmark ensures difficulty comprehensiveness, language diversity, and high-quality translation, making it a highly discriminative multilingual mathematical benchmark in the era of reasoning LLMs. We conduct a comprehensive evaluation for advanced L… ▽ More

    Submitted 30 April, 2025; v1 submitted 25 April, 2025; originally announced April 2025.

    Comments: Work in Progress

  40. arXiv:2504.17828  [pdf, other

    cs.CV cs.AI

    VEU-Bench: Towards Comprehensive Understanding of Video Editing

    Authors: Bozheng Li, Yongliang Wu, Yi Lu, Jiashuo Yu, Licheng Tang, Jiawang Cao, Wenqing Zhu, Yuyang Sun, Jay Wu, Wenbo Zhu

    Abstract: Widely shared videos on the internet are often edited. Recently, although Video Large Language Models (Vid-LLMs) have made great progress in general video understanding tasks, their capabilities in video editing understanding (VEU) tasks remain unexplored. To address this gap, in this paper, we introduce VEU-Bench (Video Editing Understanding Benchmark), a comprehensive benchmark that categorizes… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Accepted to CVPR2025

  41. arXiv:2504.17761  [pdf, other

    cs.CV

    Step1X-Edit: A Practical Framework for General Image Editing

    Authors: Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, Jiaqi Liao, Yingming Wang, Honghao Fu, Chunrui Han, Guopeng Li, Yuang Peng, Quan Sun, Jingwei Wu, Yan Cai, Zheng Ge, Ranchen Ming, Lei Xia, Xianfang Zeng, Yibo Zhu, Binxing Jiao, Xiangyu Zhang, Gang Yu, Daxin Jiang

    Abstract: In recent years, image editing models have witnessed remarkable and rapid development. The recent unveiling of cutting-edge multimodal models such as GPT-4o and Gemini2 Flash has introduced highly promising image editing capabilities. These models demonstrate an impressive aptitude for fulfilling a vast majority of user-driven editing requirements, marking a significant advancement in the field of… ▽ More

    Submitted 6 May, 2025; v1 submitted 24 April, 2025; originally announced April 2025.

    Comments: code: https://github.com/stepfun-ai/Step1X-Edit

  42. arXiv:2504.17410  [pdf, other

    cs.RO

    Bias-Eliminated PnP for Stereo Visual Odometry: Provably Consistent and Large-Scale Localization

    Authors: Guangyang Zeng, Yuan Shen, Ziyang Hong, Yuze Hong, Viorela Ila, Guodong Shi, Junfeng Wu

    Abstract: In this paper, we first present a bias-eliminated weighted (Bias-Eli-W) perspective-n-point (PnP) estimator for stereo visual odometry (VO) with provable consistency. Specifically, leveraging statistical theory, we develop an asymptotically unbiased and $\sqrt {n}$-consistent PnP estimator that accounts for varying 3D triangulation uncertainties, ensuring that the relative pose estimate converges… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: 10 pages, 7 figures

  43. arXiv:2504.16968  [pdf, other

    cs.LG cs.AI

    BackSlash: Rate Constrained Optimized Training of Large Language Models

    Authors: Jun Wu, Jiangtao Wen, Yuxing Han

    Abstract: The rapid advancement of large-language models (LLMs) has driven extensive research into parameter compression after training has been completed, yet compression during the training phase remains largely unexplored. In this work, we introduce Rate-Constrained Training (BackSlash), a novel training-time compression approach based on rate-distortion optimization (RDO). BackSlash enables a flexible t… ▽ More

    Submitted 25 April, 2025; v1 submitted 23 April, 2025; originally announced April 2025.

  44. arXiv:2504.16420  [pdf, other

    cs.IR cs.AI

    A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms

    Authors: Chengkai Huang, Hongtao Huang, Tong Yu, Kaige Xie, Junda Wu, Shuai Zhang, Julian Mcauley, Dietmar Jannach, Lina Yao

    Abstract: Recommender systems (RS) have become essential in filtering information and personalizing content for users. RS techniques have traditionally relied on modeling interactions between users and items as well as the features of content using models specific to each task. The emergence of foundation models (FMs), large scale models trained on vast amounts of data such as GPT, LLaMA and CLIP, is reshap… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  45. arXiv:2504.16331  [pdf, other

    cs.SE cs.LG

    ClarifyCoder: Clarification-Aware Fine-Tuning for Programmatic Problem Solving

    Authors: Jie JW Wu, Manav Chaudhary, Davit Abrahamyan, Arhaan Khaku, Anjiang Wei, Fatemeh H. Fard

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, a significant gap remains between their current performance and that of expert software engineers. A key differentiator is that human engineers actively seek clarification when faced with ambiguous requirements, while LLMs typically generate code regardless of uncertainties in the problem desc… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

    Comments: 12 pages, 5 figures, 6 tables

  46. arXiv:2504.15928  [pdf, other

    cs.CV cs.AI

    A Clinician-Friendly Platform for Ophthalmic Image Analysis Without Technical Barriers

    Authors: Meng Wang, Tian Lin, Qingshan Hou, Aidi Lin, Jingcheng Wang, Qingsheng Peng, Truong X. Nguyen, Danqi Fang, Ke Zou, Ting Xu, Cancan Xue, Ten Cheer Quek, Qinkai Yu, Minxin Liu, Hui Zhou, Zixuan Xiao, Guiqin He, Huiyu Liang, Tingkun Shi, Man Chen, Linna Liu, Yuanyuan Peng, Lianyu Wang, Qiuming Hu, Junhong Chen , et al. (15 additional authors not shown)

    Abstract: Artificial intelligence (AI) shows remarkable potential in medical imaging diagnostics, but current models typically require retraining when deployed across different clinical centers, limiting their widespread adoption. We introduce GlobeReady, a clinician-friendly AI platform that enables ocular disease diagnosis without retraining/fine-tuning or technical expertise. GlobeReady achieves high acc… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  47. arXiv:2504.15736  [pdf, other

    cs.LG stat.ML

    Riemannian Neural Geodesic Interpolant

    Authors: Jiawen Wu, Bingguang Chen, Yuyi Zhou, Qi Meng, Rongchan Zhu, Zhi-Ming Ma

    Abstract: Stochastic interpolants are efficient generative models that bridge two arbitrary probability density functions in finite time, enabling flexible generation from the source to the target distribution or vice versa. These models are primarily developed in Euclidean space, and are therefore limited in their application to many distribution learning problems defined on Riemannian manifolds in real-wo… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  48. arXiv:2504.15619  [pdf, other

    cs.CV

    AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization

    Authors: Jinda Lu, Jinghan Li, Yuan Gao, Junkang Wu, Jiancan Wu, Xiang Wang, Xiangnan He

    Abstract: Preference alignment through Direct Preference Optimization (DPO) has demonstrated significant effectiveness in aligning multimodal large language models (MLLMs) with human preferences. However, existing methods focus primarily on language preferences while neglecting the critical visual context. In this paper, we propose an Adaptive Vision-enhanced Preference optimization (AdaViP) that addresses… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  49. arXiv:2504.15615  [pdf, ps, other

    cs.LG stat.ML

    Dimension-Free Decision Calibration for Nonlinear Loss Functions

    Authors: Jingwu Tang, Jiayun Wu, Zhiwei Steven Wu, Jiahao Zhang

    Abstract: When model predictions inform downstream decision making, a natural question is under what conditions can the decision-makers simply respond to the predictions as if they were the true outcomes. Calibration suffices to guarantee that simple best-response to predictions is optimal. However, calibration for high-dimensional prediction outcome spaces requires exponential computational and statistical… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.

  50. arXiv:2504.15585  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment

    Authors: Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Junyuan Mao, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Chengwei Liu, Yifan Zhang, Qiankun Li , et al. (57 additional authors not shown)

    Abstract: The remarkable success of Large Language Models (LLMs) has illuminated a promising pathway toward achieving Artificial General Intelligence for both academic and industrial communities, owing to their unprecedented performance across various applications. As LLMs continue to gain prominence in both research and commercial domains, their security and safety implications have become a growing concer… ▽ More

    Submitted 22 April, 2025; originally announced April 2025.