Skip to main content

Showing 1–50 of 654 results for author: Wen, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.22523  [pdf

    cs.CY cs.AI

    Red Teaming for Generative AI, Report on a Copyright-Focused Exercise Completed in an Academic Medical Center

    Authors: James Wen, Sahil Nalawade, Zhiwei Liang, Catherine Bielick, Marisa Ferrara Boston, Alexander Chowdhury, Adele Collin, Luigi De Angelis, Jacob Ellen, Heather Frase, Rodrigo R. Gameiro, Juan Manuel Gutierrez, Pooja Kadam, Murat Keceli, Srikanth Krishnamurthy, Anne Kwok, Yanan Lance Lu, Heather Mattie, Liam G. McCoy, Katherine Miller, Allison C. Morgan, Marlene Louisa Moerig, Trang Nguyen, Alexander Owen-Post, Alex D. Ruiz , et al. (16 additional authors not shown)

    Abstract: Background: Generative artificial intelligence (AI) deployment in academic medical settings raises copyright compliance concerns. Dana-Farber Cancer Institute implemented GPT4DFCI, an internal generative AI tool utilizing OpenAI models, that is approved for enterprise use in research and operations. Given (1) the exceptionally broad adoption of the tool in our organization, (2) our research missio… ▽ More

    Submitted 2 July, 2025; v1 submitted 26 June, 2025; originally announced June 2025.

  2. arXiv:2506.19889  [pdf, ps, other

    cs.CR cs.AI

    Retrieval-Confused Generation is a Good Defender for Privacy Violation Attack of Large Language Models

    Authors: Wanli Peng, Xin Chen, Hang Fu, XinYu He, Xue Yiming, Juan Wen

    Abstract: Recent advances in large language models (LLMs) have made a profound impact on our society and also raised new security concerns. Particularly, due to the remarkable inference ability of LLMs, the privacy violation attack (PVA), revealed by Staab et al., introduces serious personal privacy issues. Existing defense methods mainly leverage LLMs to anonymize the input query, which requires costly inf… ▽ More

    Submitted 24 June, 2025; originally announced June 2025.

  3. arXiv:2506.15947  [pdf, ps, other

    cs.NI eess.SP

    HybridRAG-based LLM Agents for Low-Carbon Optimization in Low-Altitude Economy Networks

    Authors: Jinbo Wen, Cheng Su, Jiawen Kang, Jiangtian Nie, Yang Zhang, Jianhang Tang, Dusit Niyato, Chau Yuen

    Abstract: Low-Altitude Economy Networks (LAENets) are emerging as a promising paradigm to support various low-altitude services through integrated air-ground infrastructure. To satisfy low-latency and high-computation demands, the integration of Unmanned Aerial Vehicles (UAVs) with Mobile Edge Computing (MEC) systems plays a vital role, which offloads computing tasks from terminal devices to nearby UAVs, en… ▽ More

    Submitted 18 June, 2025; originally announced June 2025.

  4. arXiv:2506.15703  [pdf, ps, other

    cs.LG cs.AI

    Federated Incomplete Multi-view Clustering with Globally Fused Graph Guidance

    Authors: Guoqing Chao, Zhenghao Zhang, Lei Meng, Jie Wen, Dianhui Chu

    Abstract: Federated multi-view clustering has been proposed to mine the valuable information within multi-view data distributed across different devices and has achieved impressive results while preserving the privacy. Despite great progress, most federated multi-view clustering methods only used global pseudo-labels to guide the downstream clustering process and failed to exploit the global information whe… ▽ More

    Submitted 29 May, 2025; originally announced June 2025.

  5. arXiv:2506.13315  [pdf, ps, other

    cs.IR

    Gated Rotary-Enhanced Linear Attention for Long-term Sequential Recommendation

    Authors: Juntao Hu, Wei Zhou, Huayi Shen, Xiao Du, Jie Liao, Junhao Wen, Min Gao

    Abstract: In Sequential Recommendation Systems (SRSs), Transformer models show remarkable performance but face computation cost challenges when modeling long-term user behavior sequences due to the quadratic complexity of the dot-product attention mechanism. By approximating the dot-product attention, linear attention provides an efficient option with linear complexity. However, existing linear attention me… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 24 pages,9 figures

  6. arXiv:2506.11521  [pdf, ps, other

    cs.CR cs.AI cs.MM

    Investigating Vulnerabilities and Defenses Against Audio-Visual Attacks: A Comprehensive Survey Emphasizing Multimodal Models

    Authors: Jinming Wen, Xinyi Wu, Shuai Zhao, Yanhao Jia, Yuwen Li

    Abstract: Multimodal large language models (MLLMs), which bridge the gap between audio-visual and natural language processing, achieve state-of-the-art performance on several audio-visual tasks. Despite the superior performance of MLLMs, the scarcity of high-quality audio-visual training data and computational resources necessitates the utilization of third-party data and open-source MLLMs, a trend that is… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  7. arXiv:2506.10821  [pdf, ps, other

    cs.CV cs.AI cs.CL

    VideoDeepResearch: Long Video Understanding With Agentic Tool Using

    Authors: Huaying Yuan, Zheng Liu, Junjie Zhou, Hongjin Qian, Ji-Rong Wen, Zhicheng Dou

    Abstract: Long video understanding (LVU) presents a significant challenge for current multi-modal large language models (MLLMs) due to the task's inherent complexity and context window constraint. It is widely assumed that addressing LVU tasks requires foundation MLLMs with extended context windows, strong visual perception capabilities, and proficient domain expertise. In this work, we challenge this commo… ▽ More

    Submitted 15 June, 2025; v1 submitted 12 June, 2025; originally announced June 2025.

  8. arXiv:2506.10139  [pdf, ps, other

    cs.CL cs.AI

    Unsupervised Elicitation of Language Models

    Authors: Jiaxin Wen, Zachary Ankner, Arushi Somani, Peter Hase, Samuel Marks, Jacob Goldman-Wetzler, Linda Petrini, Henry Sleight, Collin Burns, He He, Shi Feng, Ethan Perez, Jan Leike

    Abstract: To steer pretrained language models for downstream tasks, today's post-training paradigm relies on humans to specify desired behaviors. However, for models with superhuman capabilities, it is difficult or impossible to get high-quality human supervision. To address this challenge, we introduce a new unsupervised algorithm, Internal Coherence Maximization (ICM), to fine-tune pretrained language mod… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

  9. arXiv:2506.08708  [pdf, ps, other

    cs.RO cs.AI cs.CV

    PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly

    Authors: Liang Ma, Jiajun Wen, Min Lin, Rongtao Xu, Xiwen Liang, Bingqian Lin, Jun Ma, Yongxin Wang, Ziming Wei, Haokun Lin, Mingfei Han, Meng Cao, Bokui Chen, Ivan Laptev, Xiaodan Liang

    Abstract: While vision-language models (VLMs) have demonstrated promising capabilities in reasoning and planning for embodied agents, their ability to comprehend physical phenomena, particularly within structured 3D environments, remains severely limited. To close this gap, we introduce PhyBlock, a progressive benchmark designed to assess VLMs on physical understanding and planning through robotic 3D block… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  10. arXiv:2506.07963  [pdf, ps, other

    cs.AI cs.CL cs.CV

    Reinforcing Multimodal Understanding and Generation with Dual Self-rewards

    Authors: Jixiang Hong, Yiran Zhang, Guanzhong Wang, Yi Liu, Ji-Rong Wen, Rui Yan

    Abstract: Building upon large language models (LLMs), recent large multimodal models (LMMs) unify cross-model understanding and generation into a single framework. However, LMMs still struggle to achieve accurate image-text alignment, prone to generating text responses contradicting the visual input or failing to follow the text-to-image prompts. Current solutions require external supervision (e.g., human f… ▽ More

    Submitted 12 June, 2025; v1 submitted 9 June, 2025; originally announced June 2025.

  11. arXiv:2506.04894  [pdf, ps, other

    cs.CL

    ICPC-Eval: Probing the Frontiers of LLM Reasoning with Competitive Programming Contests

    Authors: Shiyi Xu, Yiwen Hu, Yingqian Min, Zhipeng Chen, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: With the significant progress of large reasoning models in complex coding and reasoning tasks, existing benchmarks, like LiveCodeBench and CodeElo, are insufficient to evaluate the coding capabilities of large language models (LLMs) in real competition environments. Moreover, current evaluation metrics such as Pass@K fail to capture the reflective abilities of reasoning models. To address these ch… ▽ More

    Submitted 5 June, 2025; originally announced June 2025.

  12. arXiv:2506.02875  [pdf, ps, other

    cs.CV

    NTIRE 2025 XGC Quality Assessment Challenge: Methods and Results

    Authors: Xiaohong Liu, Xiongkuo Min, Qiang Hu, Xiaoyun Zhang, Jie Guo, Guangtao Zhai, Shushi Wang, Yingjie Zhou, Lu Liu, Jingxin Li, Liu Yang, Farong Wen, Li Xu, Yanwei Jiang, Xilei Zhu, Chunyi Li, Zicheng Zhang, Huiyu Duan, Xiele Wu, Yixuan Gao, Yuqin Cao, Jun Jia, Wei Sun, Jiezhang Cao, Radu Timofte , et al. (70 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2025 XGC Quality Assessment Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2025. This challenge is to address a major challenge in the field of video and talking head processing. The challenge is divided into three tracks, including user generated video, AI generated video and talking he… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

    Comments: NTIRE 2025 XGC Quality Assessment Challenge Report. arXiv admin note: text overlap with arXiv:2404.16687

  13. arXiv:2506.00930  [pdf, ps, other

    cs.AI cs.CL

    Aligning VLM Assistants with Personalized Situated Cognition

    Authors: Yongqi Li, Shen Zhou, Xiaohu Li, Xin Miao, Jintao Wen, Mayi Xu, Jianhao Chen, Birong Pan, Hankun Kang, Yuanyuan Zhu, Ming Zhong, Tieyun Qian

    Abstract: Vision-language models (VLMs) aligned with general human objectives, such as being harmless and hallucination-free, have become valuable assistants of humans in managing visual tasks. However, people with diversified backgrounds have different cognition even in the same situation. Consequently, they may have personalized expectations for VLM assistants. This highlights the urgent need to align VLM… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

    Comments: Accepted to ACL 2025 (main), camera-ready version

  14. arXiv:2506.00915  [pdf, ps, other

    cs.CV

    3D Skeleton-Based Action Recognition: A Review

    Authors: Mengyuan Liu, Hong Liu, Qianshuo Hu, Bin Ren, Junsong Yuan, Jiaying Lin, Jiajun Wen

    Abstract: With the inherent advantages of skeleton representation, 3D skeleton-based action recognition has become a prominent topic in the field of computer vision. However, previous reviews have predominantly adopted a model-oriented perspective, often neglecting the fundamental steps involved in skeleton-based action recognition. This oversight tends to ignore key components of skeleton-based action reco… ▽ More

    Submitted 1 June, 2025; originally announced June 2025.

  15. arXiv:2506.00794  [pdf, ps, other

    cs.AI

    Predicting Empirical AI Research Outcomes with Language Models

    Authors: Jiaxin Wen, Chenglei Si, Yueh-han Chen, He He, Shi Feng

    Abstract: Many promising-looking ideas in AI research fail to deliver, but their validation takes substantial human labor and compute. Predicting an idea's chance of success is thus crucial for accelerating empirical AI research, a skill that even expert researchers can only acquire through substantial experience. We build the first benchmark for this task and compare LMs with human experts. Concretely, giv… ▽ More

    Submitted 31 May, 2025; originally announced June 2025.

  16. arXiv:2506.00486  [pdf, ps, other

    cs.LG cs.AI stat.ML

    It Takes a Good Model to Train a Good Model: Generalized Gaussian Priors for Optimized LLMs

    Authors: Jun Wu, Yirong Xiong, Jiangtao Wen, Yuxing Han

    Abstract: Despite rapid advancements in the research and deployment of large language models (LLMs), the statistical distribution of model parameters, as well as their influence on initialization, training dynamics, and downstream efficiency, has received surprisingly little attention. A recent work introduced BackSlash, a training-time compression algorithm. It first demonstrated that pre-trained LLM param… ▽ More

    Submitted 4 June, 2025; v1 submitted 31 May, 2025; originally announced June 2025.

  17. arXiv:2505.24480  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Effective Code-Integrated Reasoning

    Authors: Fei Bai, Yingqian Min, Beichen Zhang, Zhipeng Chen, Wayne Xin Zhao, Lei Fang, Zheng Liu, Zhongyuan Wang, Ji-Rong Wen

    Abstract: In this paper, we investigate code-integrated reasoning, where models generate code when necessary and integrate feedback by executing it through a code interpreter. To acquire this capability, models must learn when and how to use external code tools effectively, which is supported by tool-augmented reinforcement learning (RL) through interactive learning. Despite its benefits, tool-augmented RL… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: Technical Report on Slow Thinking with LLMs: Code-Integrated Reasoning

  18. arXiv:2505.23111  [pdf, ps, other

    cs.RO

    Redundancy Parameterization of the ABB YuMi Robot Arm

    Authors: Alexander J. Elias, John T. Wen

    Abstract: The ABB YuMi is a 7-DOF collaborative robot arm with a complex, redundant kinematic structure. Path planning for the YuMi is challenging, especially with joint limits considered. The redundant degree of freedom is parameterized by the Shoulder-Elbow-Wrist (SEW) angle, called the arm angle by ABB, but the exact definition must be known for path planning outside the RobotStudio simulator. We provide… ▽ More

    Submitted 29 May, 2025; originally announced May 2025.

    Comments: 8 pages, 5 figures

  19. arXiv:2505.21906  [pdf, ps, other

    cs.RO cs.AI cs.CV

    ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge

    Authors: Zhongyi Zhou, Yichen Zhu, Junjie Wen, Chaomin Shen, Yi Xu

    Abstract: Vision-language-action (VLA) models have emerged as the next generation of models in robotics. However, despite leveraging powerful pre-trained Vision-Language Models (VLMs), existing end-to-end VLA systems often lose key capabilities during fine-tuning as the model adapts to specific robotic tasks. We argue that a generalizable VLA model should retain and expand upon the VLM's core competencies:… ▽ More

    Submitted 29 May, 2025; v1 submitted 27 May, 2025; originally announced May 2025.

    Comments: Project page: https://chatvla-2.github.io/

  20. arXiv:2505.20245  [pdf, ps, other

    cs.CL cs.AI

    KnowTrace: Bootstrapping Iterative Retrieval-Augmented Generation with Structured Knowledge Tracing

    Authors: Rui Li, Quanyu Dai, Zeyu Zhang, Xu Chen, Zhenhua Dong, Ji-Rong Wen

    Abstract: Recent advances in retrieval-augmented generation (RAG) furnish large language models (LLMs) with iterative retrievals of relevant information to handle complex multi-hop questions. These methods typically alternate between LLM reasoning and retrieval to accumulate external information into the LLM's context. However, the ever-growing context inherently imposes an increasing burden on the LLM to p… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Accepted by KDD 2025

  21. arXiv:2505.19877  [pdf, other

    cs.CV

    Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought

    Authors: Chao Huang, Benfeng Wang, Jie Wen, Chengliang Liu, Wei Wang, Li Shen, Xiaochun Cao

    Abstract: Recent advancements in reasoning capability of Multimodal Large Language Models (MLLMs) demonstrate its effectiveness in tackling complex visual tasks. However, existing MLLM-based Video Anomaly Detection (VAD) methods remain limited to shallow anomaly descriptions without deep reasoning. In this paper, we propose a new task named Video Anomaly Reasoning (VAR), which aims to enable deep analysis a… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: 9 pages, 4 figures

  22. arXiv:2505.19223  [pdf, ps, other

    cs.LG

    LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models

    Authors: Fengqi Zhu, Rongzhen Wang, Shen Nie, Xiaolu Zhang, Chunwei Wu, Jun Hu, Jun Zhou, Jianfei Chen, Yankai Lin, Ji-Rong Wen, Chongxuan Li

    Abstract: While Masked Diffusion Models (MDMs), such as LLaDA, present a promising paradigm for language modeling, there has been relatively little effort in aligning these models with human preferences via reinforcement learning. The challenge primarily arises from the high variance in Evidence Lower Bound (ELBO)-based likelihood estimates required for preference optimization. To address this issue, we pro… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  23. arXiv:2505.19126  [pdf, ps, other

    cs.CL

    MMATH: A Multilingual Benchmark for Mathematical Reasoning

    Authors: Wenyang Luo, Wayne Xin Zhao, Jing Sha, Shijin Wang, Ji-Rong Wen

    Abstract: The advent of large reasoning models, such as OpenAI o1 and DeepSeek R1, has significantly advanced complex reasoning tasks. However, their capabilities in multilingual complex reasoning remain underexplored, with existing efforts largely focused on simpler tasks like MGSM. To address this gap, we introduce MMATH, a benchmark for multilingual complex reasoning spanning 374 high-quality math proble… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

  24. arXiv:2505.19017  [pdf, ps, other

    cs.RO cs.CV cs.LG

    WorldEval: World Model as Real-World Robot Policies Evaluator

    Authors: Yaxuan Li, Yichen Zhu, Junjie Wen, Chaomin Shen, Yi Xu

    Abstract: The field of robotics has made significant strides toward developing generalist robot manipulation policies. However, evaluating these policies in real-world scenarios remains time-consuming and challenging, particularly as the number of tasks scales and environmental conditions change. In this work, we demonstrate that world models can serve as a scalable, reproducible, and reliable proxy for rea… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: The project page is available at https://worldeval.github.io

  25. arXiv:2505.18051  [pdf, other

    cs.CV

    LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision

    Authors: Anthony Fuller, Yousef Yassin, Junfeng Wen, Daniel G. Kyrollos, Tarek Ibrahim, James R. Green, Evan Shelhamer

    Abstract: Vision transformers are ever larger, more accurate, and more expensive to compute. The expense is even more extreme at high resolution as the number of tokens grows quadratically with the image size. We turn to adaptive computation to cope with this cost by learning to predict where to compute. Our LookWhere method divides the computation between a low-resolution selector and a high-resolution ext… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

  26. arXiv:2505.17005  [pdf, ps, other

    cs.CL cs.AI cs.IR

    R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

    Authors: Huatong Song, Jinhao Jiang, Wenqing Tian, Zhipeng Chen, Yuhuan Wu, Jiahao Zhao, Yingqian Min, Wayne Xin Zhao, Lei Fang, Ji-Rong Wen

    Abstract: Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting external information, but current methods often are costly, generalize poorly, or ignore the internal knowledge of the model. In this paper, we introduce R1-Searcher++, a novel framework designed to train LLMs to adaptively leverage both internal an… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

  27. arXiv:2505.16933  [pdf, ps, other

    cs.LG cs.CL cs.CV

    LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning

    Authors: Zebin You, Shen Nie, Xiaolu Zhang, Jun Hu, Jun Zhou, Zhiwu Lu, Ji-Rong Wen, Chongxuan Li

    Abstract: In this work, we introduce LLaDA-V, a purely diffusion-based Multimodal Large Language Model (MLLM) that integrates visual instruction tuning with masked diffusion models, representing a departure from the autoregressive paradigms dominant in current multimodal approaches. Built upon LLaDA, a representative large language diffusion model, LLaDA-V incorporates a vision encoder and MLP connector tha… ▽ More

    Submitted 4 June, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: Project page and codes: \url{https://ml-gsai.github.io/LLaDA-V-demo/}

  28. arXiv:2505.16865  [pdf, ps, other

    cs.IR

    LARES: Latent Reasoning for Sequential Recommendation

    Authors: Enze Liu, Bowen Zheng, Xiaolei Wang, Wayne Xin Zhao, Jinpeng Wang, Sheng Chen, Ji-Rong Wen

    Abstract: Sequential recommender systems have become increasingly important in real-world applications that model user behavior sequences to predict their preferences. However, existing sequential recommendation methods predominantly rely on non-reasoning paradigms, which may limit the model's computational capacity and result in suboptimal recommendation performance. To address these limitations, we presen… ▽ More

    Submitted 4 June, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

  29. arXiv:2505.16834  [pdf, other

    cs.CL cs.AI cs.IR

    SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis

    Authors: Shuang Sun, Huatong Song, Yuhao Wang, Ruiyang Ren, Jinhao Jiang, Junjie Zhang, Fei Bai, Jia Deng, Wayne Xin Zhao, Zheng Liu, Lei Fang, Zhongyuan Wang, Ji-Rong Wen

    Abstract: Retrieval-augmented generation (RAG) systems have advanced large language models (LLMs) in complex deep search scenarios requiring multi-step reasoning and iterative information retrieval. However, existing approaches face critical limitations that lack high-quality training trajectories or suffer from the distributional mismatches in simulated environments and prohibitive computational costs for… ▽ More

    Submitted 25 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

  30. arXiv:2505.16810  [pdf, other

    cs.IR

    DeepRec: Towards a Deep Dive Into the Item Space with Large Language Model Based Recommendation

    Authors: Bowen Zheng, Xiaolei Wang, Enze Liu, Xi Wang, Lu Hongyu, Yu Chen, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: Recently, large language models (LLMs) have been introduced into recommender systems (RSs), either to enhance traditional recommendation models (TRMs) or serve as recommendation backbones. However, existing LLM-based RSs often do not fully exploit the complementary advantages of LLMs (e.g., world knowledge and reasoning) and TRMs (e.g., recommendation-specific knowledge and efficiency) to fully ex… ▽ More

    Submitted 26 May, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

  31. arXiv:2505.16410  [pdf, other

    cs.CL cs.AI cs.LG

    Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning

    Authors: Guanting Dong, Yifei Chen, Xiaoxi Li, Jiajie Jin, Hongjin Qian, Yutao Zhu, Hangyu Mao, Guorui Zhou, Zhicheng Dou, Ji-Rong Wen

    Abstract: Recently, large language models (LLMs) have shown remarkable reasoning capabilities via large-scale reinforcement learning (RL). However, leveraging the RL algorithm to empower effective multi-tool collaborative reasoning in LLMs remains an open challenge. In this paper, we introduce Tool-Star, an RL-based framework designed to empower LLMs to autonomously invoke multiple external tools during ste… ▽ More

    Submitted 22 May, 2025; originally announced May 2025.

    Comments: Working in progress

  32. arXiv:2505.15444  [pdf, ps, other

    cs.CL cs.AI

    Single LLM, Multiple Roles: A Unified Retrieval-Augmented Generation Framework Using Role-Specific Token Optimization

    Authors: Yutao Zhu, Jiajie Jin, Hongjin Qian, Zheng Liu, Zhicheng Dou, Ji-Rong Wen

    Abstract: Existing studies have optimized retrieval-augmented generation (RAG) across various sub-tasks, such as query understanding and retrieval refinement, but integrating these optimizations into a unified framework remains challenging. To tackle this problem, this work proposes RoleRAG, a unified RAG framework that achieves efficient multi-task processing through role-specific token optimization. RoleR… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  33. arXiv:2505.14680  [pdf, ps, other

    cs.IR cs.AI cs.CL cs.HC

    NExT-Search: Rebuilding User Feedback Ecosystem for Generative AI Search

    Authors: Sunhao Dai, Wenjie Wang, Liang Pang, Jun Xu, See-Kiong Ng, Ji-Rong Wen, Tat-Seng Chua

    Abstract: Generative AI search is reshaping information retrieval by offering end-to-end answers to complex queries, reducing users' reliance on manually browsing and summarizing multiple web pages. However, while this paradigm enhances convenience, it disrupts the feedback-driven improvement loop that has historically powered the evolution of traditional Web search. Web search can continuously improve thei… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

    Comments: SIGIR 2025 Perspective Paper

  34. arXiv:2505.12710  [pdf, ps, other

    cs.LG cs.NI

    Confidence-Regulated Generative Diffusion Models for Reliable AI Agent Migration in Vehicular Metaverses

    Authors: Yingkai Kang, Jiawen Kang, Jinbo Wen, Tao Zhang, Zhaohui Yang, Dusit Niyato, Yan Zhang

    Abstract: Vehicular metaverses are an emerging paradigm that merges intelligent transportation systems with virtual spaces, leveraging advanced digital twin and Artificial Intelligence (AI) technologies to seamlessly integrate vehicles, users, and digital environments. In this paradigm, vehicular AI agents are endowed with environment perception, decision-making, and action execution capabilities, enabling… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

  35. arXiv:2505.12641  [pdf, ps, other

    cs.CV cs.AI

    Single Image Reflection Removal via inter-layer Complementarity

    Authors: Yue Huang, Zi'ang Li, Tianle Hu, Jie Wen, Guanbin Li, Jinglin Zhang, Guoxu Zhou, Xiaozhao Fang

    Abstract: Although dual-stream architectures have achieved remarkable success in single image reflection removal, they fail to fully exploit inter-layer complementarity in their physical modeling and network design, which limits the quality of image separation. To address this fundamental limitation, we propose two targeted improvements to enhance dual-stream architectures: First, we introduce a novel inter… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  36. arXiv:2505.12408  [pdf, ps, other

    cs.CV cs.AI cs.HC

    ViEEG: Hierarchical Neural Coding with Cross-Modal Progressive Enhancement for EEG-Based Visual Decoding

    Authors: Minxu Liu, Donghai Guan, Chuhang Zheng, Chunwei Tian, Jie Wen, Qi Zhu

    Abstract: Understanding and decoding brain activity into visual representations is a fundamental challenge at the intersection of neuroscience and artificial intelligence. While EEG-based visual decoding has shown promise due to its non-invasive, low-cost nature and millisecond-level temporal resolution, existing methods are limited by their reliance on flat neural representations that overlook the brain's… ▽ More

    Submitted 25 May, 2025; v1 submitted 18 May, 2025; originally announced May 2025.

    Comments: 24 pages, 18 figures

  37. arXiv:2505.11932  [pdf, other

    cs.CL cs.IR

    Neuro-Symbolic Query Compiler

    Authors: Yuyao Zhang, Zhicheng Dou, Xiaoxi Li, Jiajie Jin, Yongkang Wu, Zhonghua Li, Qi Ye, Ji-Rong Wen

    Abstract: Precise recognition of search intent in Retrieval-Augmented Generation (RAG) systems remains a challenging goal, especially under resource constraints and for complex queries with nested structures and dependencies. This paper presents QCompiler, a neuro-symbolic framework inspired by linguistic grammar rules and compiler design, to bridge this gap. It theoretically designs a minimal yet sufficien… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: Findings of ACL2025, codes are available at this url: https://github.com/YuyaoZhangQAQ/Query_Compiler

  38. arXiv:2505.09590  [pdf, ps, other

    cs.IR

    Distance-aware Self-adaptive Graph Convolution for Fine-grained Hierarchical Recommendation

    Authors: Tao Huang, Yihong Chen, Wei Fan, Wei Zhou, Junhao Wen

    Abstract: Graph Convolutional Networks (GCNs) are widely used to improve recommendation accuracy and performance by effectively learning the representations of user and item nodes. However, two major challenges remain: (1) the lack of further optimization in the graph representation structure and (2) insufficient attention given to the varying contributions of different convolutional layers.This paper propo… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  39. arXiv:2505.09528  [pdf, other

    cs.CV

    Conformal Bounds on Full-Reference Image Quality for Imaging Inverse Problems

    Authors: Jeffrey Wen, Rizwan Ahmad, Philip Schniter

    Abstract: In imaging inverse problems, we would like to know how close the recovered image is to the true image in terms of full-reference image quality (FRIQ) metrics like PSNR, SSIM, LPIPS, etc. This is especially important in safety-critical applications like medical imaging, where knowing that, say, the SSIM was poor could potentially avoid a costly misdiagnosis. But since we don't know the true image,… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

    Journal ref: Transactions on Machine Learning Research, May 2025

  40. arXiv:2505.07581  [pdf, other

    cs.AI cs.CY

    YuLan-OneSim: Towards the Next Generation of Social Simulator with Large Language Models

    Authors: Lei Wang, Heyang Gao, Xiaohe Bo, Xu Chen, Ji-Rong Wen

    Abstract: Leveraging large language model (LLM) based agents to simulate human social behaviors has recently gained significant attention. In this paper, we introduce a novel social simulator called YuLan-OneSim. Compared to previous works, YuLan-OneSim distinguishes itself in five key aspects: (1) Code-free scenario construction: Users can simply describe and refine their simulation scenarios through natur… ▽ More

    Submitted 22 May, 2025; v1 submitted 12 May, 2025; originally announced May 2025.

  41. arXiv:2505.07290  [pdf, other

    cs.NI

    Multi-Agent DRL for Multi-Objective Twin Migration Routing with Workload Prediction in 6G-enabled IoV

    Authors: Peng Yin, Wentao Liang, Jinbo Wen, Jiawen Kang, Junlong Chen, Dusit Niyato

    Abstract: Sixth Generation (6G)-enabled Internet of Vehicles (IoV) facilitates efficient data synchronization through ultra-fast bandwidth and high-density connectivity, enabling the emergence of Vehicle Twins (VTs). As highly accurate replicas of vehicles, VTs can support intelligent vehicular applications for occupants in 6G-enabled IoV. Thanks to the full coverage capability of 6G, resource-constrained v… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

  42. arXiv:2505.04364  [pdf, other

    cs.MA cs.CL

    Benchmarking LLMs' Swarm intelligence

    Authors: Kai Ruan, Mowen Huang, Ji-Rong Wen, Hao Sun

    Abstract: Large Language Models (LLMs) show potential for complex reasoning, yet their capacity for emergent coordination in Multi-Agent Systems (MAS) when operating under strict swarm-like constraints-limited local perception and communication-remains largely unexplored. Existing benchmarks often do not fully capture the unique challenges of decentralized coordination when agents operate with incomplete sp… ▽ More

    Submitted 28 May, 2025; v1 submitted 7 May, 2025; originally announced May 2025.

    Comments: added new ref

  43. arXiv:2505.01969  [pdf, other

    cs.CV

    MC3D-AD: A Unified Geometry-aware Reconstruction Model for Multi-category 3D Anomaly Detection

    Authors: Jiayi Cheng, Can Gao, Jie Zhou, Jiajun Wen, Tao Dai, Jinbao Wang

    Abstract: 3D Anomaly Detection (AD) is a promising means of controlling the quality of manufactured products. However, existing methods typically require carefully training a task-specific model for each category independently, leading to high cost, low efficiency, and weak generalization. Therefore, this paper presents a novel unified model for Multi-Category 3D Anomaly Detection (MC3D-AD) that aims to uti… ▽ More

    Submitted 3 May, 2025; originally announced May 2025.

    Comments: 7 pages of main text, 3 pages of appendix, accepted to IJCAI 2025

  44. arXiv:2505.00662  [pdf, other

    cs.CL cs.AI cs.LG

    DeepCritic: Deliberate Critique with Large Language Models

    Authors: Wenkai Yang, Jingwen Chen, Yankai Lin, Ji-Rong Wen

    Abstract: As Large Language Models (LLMs) are rapidly evolving, providing accurate feedback and scalable oversight on their outputs becomes an urgent and critical problem. Leveraging LLMs as critique models to achieve automated supervision is a promising solution. In this work, we focus on studying and enhancing the math critique ability of LLMs. Current LLM critics provide critiques that are too shallow an… ▽ More

    Submitted 1 May, 2025; originally announced May 2025.

    Comments: Work in progress. Data and models are available at https://github.com/RUCBM/DeepCritic

  45. arXiv:2504.21776  [pdf, other

    cs.CL cs.AI cs.IR

    WebThinker: Empowering Large Reasoning Models with Deep Research Capability

    Authors: Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yutao Zhu, Yongkang Wu, Ji-Rong Wen, Zhicheng Dou

    Abstract: Large reasoning models (LRMs), such as OpenAI-o1 and DeepSeek-R1, demonstrate impressive long-horizon reasoning capabilities. However, their reliance on static internal knowledge limits their performance on complex, knowledge-intensive tasks and hinders their ability to produce comprehensive research reports requiring synthesis of diverse web information. To address this, we propose \textbf{WebThi… ▽ More

    Submitted 30 April, 2025; originally announced April 2025.

  46. arXiv:2504.21019  [pdf, other

    cs.CL cs.AI

    Kill two birds with one stone: generalized and robust AI-generated text detection via dynamic perturbations

    Authors: Yinghan Zhou, Juan Wen, Wanli Peng, Yiming Xue, Ziwei Zhang, Zhengxian Wu

    Abstract: The growing popularity of large language models has raised concerns regarding the potential to misuse AI-generated text (AIGT). It becomes increasingly critical to establish an excellent AIGT detection method with high generalization and robustness. However, existing methods either focus on model generalization or concentrate on robustness. The unified mechanism, to simultaneously address the chal… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

    Comments: Accepted by NAACL 2025 main conference

  47. arXiv:2504.20458  [pdf, other

    cs.IR cs.CL

    Search-Based Interaction For Conversation Recommendation via Generative Reward Model Based Simulated User

    Authors: Xiaolei Wang, Chunxuan Xia, Junyi Li, Fanzhe Meng, Lei Huang, Jinpeng Wang, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: Conversational recommendation systems (CRSs) use multi-turn interaction to capture user preferences and provide personalized recommendations. A fundamental challenge in CRSs lies in effectively understanding user preferences from conversations. User preferences can be multifaceted and complex, posing significant challenges for accurate recommendations even with access to abundant external knowledg… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: Accepted by SIGIR 2025

  48. arXiv:2504.19433  [pdf, other

    cs.CR

    GTSD: Generative Text Steganography Based on Diffusion Model

    Authors: Zhengxian Wu, Juan Wen, Yiming Xue, Ziwei Zhang, Yinghan Zhou

    Abstract: With the rapid development of deep learning, existing generative text steganography methods based on autoregressive models have achieved success. However, these autoregressive steganography approaches have certain limitations. Firstly, existing methods require encoding candidate words according to their output probability and generating each stego word one by one, which makes the generation proces… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Journal ref: ICONIP 2024

  49. arXiv:2504.18782  [pdf, other

    cs.CV cs.MM

    CAMeL: Cross-modality Adaptive Meta-Learning for Text-based Person Retrieval

    Authors: Hang Yu, Jiahao Wen, Zhedong Zheng

    Abstract: Text-based person retrieval aims to identify specific individuals within an image database using textual descriptions. Due to the high cost of annotation and privacy protection, researchers resort to synthesized data for the paradigm of pretraining and fine-tuning. However, these generated data often exhibit domain biases in both images and textual annotations, which largely compromise the scalabi… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  50. arXiv:2504.16968  [pdf, other

    cs.LG cs.AI

    BackSlash: Rate Constrained Optimized Training of Large Language Models

    Authors: Jun Wu, Jiangtao Wen, Yuxing Han

    Abstract: The rapid advancement of large-language models (LLMs) has driven extensive research into parameter compression after training has been completed, yet compression during the training phase remains largely unexplored. In this work, we introduce Rate-Constrained Training (BackSlash), a novel training-time compression approach based on rate-distortion optimization (RDO). BackSlash enables a flexible t… ▽ More

    Submitted 26 May, 2025; v1 submitted 23 April, 2025; originally announced April 2025.