Skip to main content

Showing 1–50 of 412 results for author: Zheng, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.10046  [pdf, ps, other

    cs.CV

    Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

    Authors: Bingda Tang, Boyang Zheng, Xichen Pan, Sayak Paul, Saining Xie

    Abstract: This paper does not describe a new method; instead, it provides a thorough exploration of an important yet understudied design space related to recent advances in text-to-image synthesis -- specifically, the deep fusion of large language models (LLMs) and diffusion transformers (DiTs) for multi-modal generation. Previous studies mainly focused on overall system performance rather than detailed com… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

  2. arXiv:2505.09388  [pdf, other

    cs.CL

    Qwen3 Technical Report

    Authors: An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jing Zhou , et al. (35 additional authors not shown)

    Abstract: In this work, we present Qwen3, the latest version of the Qwen model family. Qwen3 comprises a series of large language models (LLMs) designed to advance performance, efficiency, and multilingual capabilities. The Qwen3 series includes models of both dense and Mixture-of-Expert (MoE) architectures, with parameter scales ranging from 0.6 to 235 billion. A key innovation in Qwen3 is the integration… ▽ More

    Submitted 14 May, 2025; originally announced May 2025.

  3. arXiv:2505.07294  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    HuB: Learning Extreme Humanoid Balance

    Authors: Tong Zhang, Boyuan Zheng, Ruiqian Nai, Yingdong Hu, Yen-Jen Wang, Geng Chen, Fanqi Lin, Jiongye Li, Chuye Hong, Koushil Sreenath, Yang Gao

    Abstract: The human body demonstrates exceptional motor capabilities-such as standing steadily on one foot or performing a high kick with the leg raised over 1.5 meters-both requiring precise balance control. While recent research on humanoid control has leveraged reinforcement learning to track human motions for skill acquisition, applying this paradigm to balance-intensive tasks remains challenging. In th… ▽ More

    Submitted 12 May, 2025; originally announced May 2025.

    Comments: Project website: https://hub-robot.github.io

  4. arXiv:2505.07197  [pdf, ps, other

    cs.IR

    A Generative Re-ranking Model for List-level Multi-objective Optimization at Taobao

    Authors: Yue Meng, Cheng Guo, Yi Cao, Tong Liu, Bo Zheng

    Abstract: E-commerce recommendation systems aim to generate ordered lists of items for customers, optimizing multiple business objectives, such as clicks, conversions and Gross Merchandise Volume (GMV). Traditional multi-objective optimization methods like formulas or Learning-to-rank (LTR) models take effect at item-level, neglecting dynamic user intent and contextual item interactions. List-level multi-ob… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  5. arXiv:2505.06708  [pdf, ps, other

    cs.CL

    Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

    Authors: Zihan Qiu, Zekun Wang, Bo Zheng, Zeyu Huang, Kaiyue Wen, Songlin Yang, Rui Men, Le Yu, Fei Huang, Suozhi Huang, Dayiheng Liu, Jingren Zhou, Junyang Lin

    Abstract: Gating mechanisms have been widely utilized, from early models like LSTMs and Highway Networks to recent state space models, linear attention, and also softmax attention. Yet, existing literature rarely examines the specific effects of gating. In this work, we conduct comprehensive experiments to systematically investigate gating-augmented softmax attention variants. Specifically, we perform a com… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  6. arXiv:2505.05336  [pdf, other

    cs.CV

    Progressive Inertial Poser: Progressive Real-Time Kinematic Chain Estimation for 3D Full-Body Pose from Three IMU Sensors

    Authors: Zunjie Zhu, Yan Zhao, Yihan Hu, Guoxiang Wang, Hai Qiu, Bolun Zheng, Chenggang Yan, Feng Xu

    Abstract: The motion capture system that supports full-body virtual representation is of key significance for virtual reality. Compared to vision-based systems, full-body pose estimation from sparse tracking signals is not limited by environmental conditions or recording range. However, previous works either face the challenge of wearing additional sensors on the pelvis and lower-body or rely on external vi… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  7. arXiv:2505.05098  [pdf, ps, other

    cs.RO cs.CL cs.CV cs.ET

    X-Driver: Explainable Autonomous Driving with Vision-Language Models

    Authors: Wei Liu, Jiyuan Zhang, Binxiong Zheng, Yufeng Hu, Yingzhan Lin, Zengfeng Zeng

    Abstract: End-to-end autonomous driving has advanced significantly, offering benefits such as system simplicity and stronger driving performance in both open-loop and closed-loop settings than conventional pipelines. However, existing frameworks still suffer from low success rates in closed-loop evaluations, highlighting their limitations in real-world deployment. In this paper, we introduce X-Driver, a uni… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  8. arXiv:2504.21282  [pdf, ps, other

    cs.DB

    Birdie: Natural Language-Driven Table Discovery Using Differentiable Search Index

    Authors: Yuxiang Guo, Zhonghao Hu, Yuren Mao, Baihua Zheng, Yunjun Gao, Mingwei Zhou

    Abstract: Natural language (NL)-driven table discovery identifies relevant tables from large table repositories based on NL queries. While current deep-learning-based methods using the traditional dense vector search pipeline, i.e., representation-index-search, achieve remarkable accuracy, they face several limitations that impede further performance improvements: (i) the errors accumulated during the table… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: Accepted by VLDB 2025

  9. arXiv:2504.20384  [pdf, other

    cs.CV

    FiLA-Video: Spatio-Temporal Compression for Fine-Grained Long Video Understanding

    Authors: Yanan Guo, Wenhui Dong, Jun Song, Shiding Zhu, Xuan Zhang, Hanqing Yang, Yingbo Wang, Yang Du, Xianing Chen, Bo Zheng

    Abstract: Recent advancements in video understanding within visual large language models (VLLMs) have led to notable progress. However, the complexity of video data and contextual processing limitations still hinder long-video comprehension. A common approach is video feature compression to reduce token input to large language models, yet many methods either fail to prioritize essential features, leading to… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: 8 pages, 6 figures

  10. arXiv:2504.19438  [pdf, other

    eess.IV cs.CV

    Dual Attention Driven Lumbar Magnetic Resonance Image Feature Enhancement and Automatic Diagnosis of Herniation

    Authors: Lingrui Zhang, Liang Guo, Xiao An, Feng Lin, Binlong Zheng, Jiankun Wang, Zhirui Li

    Abstract: Lumbar disc herniation (LDH) is a common musculoskeletal disease that requires magnetic resonance imaging (MRI) for effective clinical management. However, the interpretation of MRI images heavily relies on the expertise of radiologists, leading to delayed diagnosis and high costs for training physicians. Therefore, this paper proposes an innovative automated LDH classification framework. To addre… ▽ More

    Submitted 27 April, 2025; originally announced April 2025.

    Comments: 9 pages, 7 figures

  11. arXiv:2504.12597  [pdf, other

    cs.CL

    GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning

    Authors: Liangyu Xu, Yingxiu Zhao, Jingyun Wang, Yingyao Wang, Bu Pi, Chen Wang, Mingliang Zhang, Jihao Gu, Xiang Li, Xiaoyong Zhu, Jun Song, Bo Zheng

    Abstract: Geometry problem-solving (GPS), a challenging task requiring both visual comprehension and symbolic reasoning, effectively measures the reasoning capabilities of multimodal large language models (MLLMs). Humans exhibit strong reasoning ability in this task through accurate identification and adaptive application of geometric principles within visual contexts. However, existing benchmarks fail to j… ▽ More

    Submitted 23 April, 2025; v1 submitted 16 April, 2025; originally announced April 2025.

    Comments: 10 pages, 8 figures

  12. arXiv:2504.12364  [pdf, other

    cs.CV

    DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging

    Authors: Tianhui Song, Weixin Feng, Shuai Wang, Xubin Li, Tiezheng Ge, Bo Zheng, Limin Wang

    Abstract: The success of text-to-image (T2I) generation models has spurred a proliferation of numerous model checkpoints fine-tuned from the same base model on various specialized datasets. This overwhelming specialized model production introduces new challenges for high parameter redundancy and huge storage cost, thereby necessitating the development of effective methods to consolidate and unify the capabi… ▽ More

    Submitted 16 April, 2025; originally announced April 2025.

  13. arXiv:2504.10074  [pdf, other

    cs.AI

    MMKB-RAG: A Multi-Modal Knowledge-Based Retrieval-Augmented Generation Framework

    Authors: Zihan Ling, Zhiyao Guo, Yixuan Huang, Yi An, Shuai Xiao, Jinsong Lan, Xiaoyong Zhu, Bo Zheng

    Abstract: Recent advancements in large language models (LLMs) and multi-modal LLMs have been remarkable. However, these models still rely solely on their parametric knowledge, which limits their ability to generate up-to-date information and increases the risk of producing erroneous content. Retrieval-Augmented Generation (RAG) partially mitigates these challenges by incorporating external data sources, yet… ▽ More

    Submitted 20 April, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

  14. arXiv:2504.07079  [pdf, other

    cs.AI cs.CL cs.CV

    SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

    Authors: Boyuan Zheng, Michael Y. Fatemi, Xiaolong Jin, Zora Zhiruo Wang, Apurva Gandhi, Yueqi Song, Yu Gu, Jayanth Srinivasa, Gaowen Liu, Graham Neubig, Yu Su

    Abstract: To survive and thrive in complex environments, humans have evolved sophisticated self-improvement mechanisms through environment exploration, hierarchical abstraction of experiences into reuseable skills, and collaborative construction of an ever-growing skill repertoire. Despite recent advancements, autonomous web agents still lack crucial self-improvement capabilities, struggling with procedural… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

  15. arXiv:2504.06632  [pdf, other

    cs.CV

    PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering

    Authors: Yifan Gao, Zihang Lin, Chuanbin Liu, Min Zhou, Tiezheng Ge, Bo Zheng, Hongtao Xie

    Abstract: Product posters, which integrate subject, scene, and text, are crucial promotional tools for attracting customers. Creating such posters using modern image generation methods is valuable, while the main challenge lies in accurately rendering text, especially for complex writing systems like Chinese, which contains over 10,000 individual characters. In this work, we identify the key to precise text… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: Accepted by CVPR 2025. Project Page: https://poster-maker.github.io

  16. arXiv:2504.05321  [pdf, other

    cs.IR cs.AI cs.LG

    VALUE: Value-Aware Large Language Model for Query Rewriting via Weighted Trie in Sponsored Search

    Authors: Boyang Zuo, Xiao Zhang, Feng Li, Pengjie Wang, Jian Xu, Bo Zheng

    Abstract: In the realm of sponsored search advertising, matching advertisements with the search intent of a user's query is crucial. Query-to-bidwords(i.e. bidding keywords) rewriting is a vital technique that has garnered significant attention. Recently, with the prevalence of LLMs, generative retrieval methods have proven effective in producing high-relevance rewrites. However, we have identified a signif… ▽ More

    Submitted 25 February, 2025; originally announced April 2025.

  17. arXiv:2504.04405  [pdf, other

    cs.IR cs.AI

    Universal Item Tokenization for Transferable Generative Recommendation

    Authors: Bowen Zheng, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: Recently, generative recommendation has emerged as a promising paradigm, attracting significant research attention. The basic framework involves an item tokenizer, which represents each item as a sequence of codes serving as its identifier, and a generative recommender that predicts the next item by autoregressively generating the target item identifier. However, in existing methods, both the toke… ▽ More

    Submitted 13 April, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

  18. arXiv:2504.04400  [pdf, other

    cs.IR cs.AI

    Pre-training Generative Recommender with Multi-Identifier Item Tokenization

    Authors: Bowen Zheng, Enze Liu, Zhongfu Chen, Zhongrui Ma, Yue Wang, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: Generative recommendation autoregressively generates item identifiers to recommend potential items. Existing methods typically adopt a one-to-one mapping strategy, where each item is represented by a single identifier. However, this scheme poses issues, such as suboptimal semantic modeling for low-frequency items and limited diversity in token sequence data. To overcome these limitations, we propo… ▽ More

    Submitted 13 April, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

  19. Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization

    Authors: Zhanda Zhu, Christina Giannoula, Muralidhar Andoorveedu, Qidong Su, Karttikeya Mangalam, Bojian Zheng, Gennady Pekhimenko

    Abstract: Various parallelism, such as data, tensor, and pipeline parallelism, along with memory optimizations like activation checkpointing, redundancy elimination, and offloading, have been proposed to accelerate distributed training for Large Language Models. To find the best combination of these techniques, automatic distributed training systems are proposed. However, existing systems only tune a subset… ▽ More

    Submitted 24 March, 2025; originally announced March 2025.

    Comments: Accepted by EuroSys 2025

  20. arXiv:2503.18240  [pdf, other

    cs.IT eess.SP

    A Tutorial on Six-Dimensional Movable Antenna for 6G Networks: Synergizing Positionable and Rotatable Antennas

    Authors: Xiaodan Shao, Weidong Mei, Changsheng You, Qingqing Wu, Beixiong Zheng, Cheng-Xiang Wang, Junling Li, Rui Zhang, Robert Schober, Lipeng Zhu, Weihua Zhuang, Xuemin Shen

    Abstract: Six-dimensional movable antenna (6DMA) is a new and revolutionary technique that fully exploits the wireless channel spatial variations at the transmitter/receiver by flexibly adjusting the three-dimensional (3D) positions and/or 3D rotations of antennas/antenna surfaces (sub-arrays), thereby improving the performance of wireless networks cost-effectively without the need to deploy addit… ▽ More

    Submitted 7 May, 2025; v1 submitted 23 March, 2025; originally announced March 2025.

    Comments: 46 pages, submitted to IEEE for publication

  21. arXiv:2503.17407  [pdf, other

    cs.CL cs.LG

    A Comprehensive Survey on Long Context Language Modeling

    Authors: Jiaheng Liu, Dawei Zhu, Zhiqi Bai, Yancheng He, Huanxuan Liao, Haoran Que, Zekun Wang, Chenchen Zhang, Ge Zhang, Jiebin Zhang, Yuanxing Zhang, Zhuo Chen, Hangyu Guo, Shilong Li, Ziqiang Liu, Yong Shan, Yifan Song, Jiayi Tian, Wenhao Wu, Zhejian Zhou, Ruijie Zhu, Junlan Feng, Yang Gao, Shizhu He, Zhoujun Li , et al. (12 additional authors not shown)

    Abstract: Efficient processing of long contexts has been a persistent pursuit in Natural Language Processing. With the growing number of long documents, dialogues, and other textual data, it is important to develop Long Context Language Models (LCLMs) that can process and analyze extensive inputs in an effective and efficient way. In this paper, we present a comprehensive survey on recent advances in long-c… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  22. arXiv:2503.17097  [pdf, other

    cs.CV

    R2LDM: An Efficient 4D Radar Super-Resolution Framework Leveraging Diffusion Model

    Authors: Boyuan Zheng, Shouyi Lu, Renbo Huang, Minqing Huang, Fan Lu, Wei Tian, Guirong Zhuo, Lu Xiong

    Abstract: We introduce R2LDM, an innovative approach for generating dense and accurate 4D radar point clouds, guided by corresponding LiDAR point clouds. Instead of utilizing range images or bird's eye view (BEV) images, we represent both LiDAR and 4D radar point clouds using voxel features, which more effectively capture 3D shape information. Subsequently, we propose the Latent Voxel Diffusion Model (LVDM)… ▽ More

    Submitted 21 March, 2025; originally announced March 2025.

  23. arXiv:2503.16385  [pdf, ps, other

    cs.AI

    Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation

    Authors: Yijia Luo, Yulin Song, Xingyao Zhang, Jiaheng Liu, Weixun Wang, GengRu Chen, Wenbo Su, Bo Zheng

    Abstract: Recent advancements in large language models (LLMs) have demonstrated remarkable reasoning capabilities through long chain-of-thought (CoT) reasoning. The R1 distillation scheme has emerged as a promising approach for training cost-effective models with enhanced reasoning abilities. However, the underlying mechanisms driving its effectiveness remain unclear. This study examines the universality of… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  24. arXiv:2503.15990  [pdf, other

    cs.CL

    ECKGBench: Benchmarking Large Language Models in E-commerce Leveraging Knowledge Graph

    Authors: Langming Liu, Haibin Chen, Yuhao Wang, Yujin Yuan, Shilei Liu, Wenbo Su, Xiangyu Zhao, Bo Zheng

    Abstract: Large language models (LLMs) have demonstrated their capabilities across various NLP tasks. Their potential in e-commerce is also substantial, evidenced by practical implementations such as platform search, personalized recommendations, and customer service. One primary concern associated with LLMs is their factuality (e.g., hallucination), which is urgent in e-commerce due to its significant impa… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  25. arXiv:2503.12478  [pdf, other

    cs.LG cs.AI cs.DB

    KDSelector: A Knowledge-Enhanced and Data-Efficient Model Selector Learning Framework for Time Series Anomaly Detection

    Authors: Zhiyu Liang, Dongrui Cai, Chenyuan Zhang, Zheng Liang, Chen Liang, Bo Zheng, Shi Qiu, Jin Wang, Hongzhi Wang

    Abstract: Model selection has been raised as an essential problem in the area of time series anomaly detection (TSAD), because there is no single best TSAD model for the highly heterogeneous time series in real-world applications. However, despite the success of existing model selection solutions that train a classification model (especially neural network, NN) using historical data as a selector to predict… ▽ More

    Submitted 19 March, 2025; v1 submitted 16 March, 2025; originally announced March 2025.

    Comments: This paper has been accepted by SIGMOD 2025

  26. arXiv:2503.12183  [pdf, other

    cs.IR

    Bridging Textual-Collaborative Gap through Semantic Codes for Sequential Recommendation

    Authors: Enze Liu, Bowen Zheng, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: In recent years, substantial research efforts have been devoted to enhancing sequential recommender systems by integrating abundant side information with ID-based collaborative information. This study specifically focuses on leveraging the textual metadata (e.g., titles and brands) associated with items. While existing methods have achieved notable success by combining text and ID representations,… ▽ More

    Submitted 15 March, 2025; originally announced March 2025.

  27. arXiv:2503.11441  [pdf, other

    cs.LG

    D3: Diversity, Difficulty, and Dependability-Aware Data Selection for Sample-Efficient LLM Instruction Tuning

    Authors: Jia Zhang, Chen-Xi Zhang, Yao Liu, Yi-Xuan Jin, Xiao-Wen Yang, Bo Zheng, Yi Liu, Lan-Zhe Guo

    Abstract: Recent advancements in instruction tuning for large language models (LLMs) suggest that a small, high-quality dataset can significantly equip LLMs with instruction-following capabilities, outperforming large datasets often burdened by quality and redundancy issues. However, the challenge lies in automatically identifying valuable subsets from large datasets to boost both the effectiveness and effi… ▽ More

    Submitted 14 March, 2025; originally announced March 2025.

  28. arXiv:2503.10472  [pdf, ps, other

    eess.SP cs.IT

    Rotatable Antennas for Integrated Sensing and Communications

    Authors: Chao Zhou, Changsheng You, Beixiong Zheng, Xiaodan Shao, Rui Zhang

    Abstract: In this letter, we propose to deploy rotatable antennas (RAs) at the base station (BS) to enhance both communication and sensing (C&S) performances, by exploiting a new spatial degree-of-freedom (DoF) offered by array rotation. Specifically, we formulate a multi-objective optimization problem to simultaneously maximize the sum-rate of multiple communication users and minimize the Cramér-Rao bound… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: This work is submitted to IEEE for possible publication

  29. arXiv:2503.10304  [pdf, other

    cs.LG cs.AI cs.GT

    Nash Equilibrium Constrained Auto-bidding With Bi-level Reinforcement Learning

    Authors: Zhiyu Mou, Miao Xu, Rongquan Bai, Zhuoran Yang, Chuan Yu, Jian Xu, Bo Zheng

    Abstract: Many online advertising platforms provide advertisers with auto-bidding services to enhance their advertising performance. However, most existing auto-bidding algorithms fail to accurately capture the auto-bidding problem formulation that the platform truly faces, let alone solve it. Actually, we argue that the platform should try to help optimize each advertiser's performance to the greatest exte… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  30. arXiv:2503.09527  [pdf, other

    cs.CV cs.AI

    CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games

    Authors: Peng Chen, Pi Bu, Yingyao Wang, Xinyi Wang, Ziming Wang, Jie Guo, Yingxiu Zhao, Qi Zhu, Jun Song, Siran Yang, Jiamang Wang, Bo Zheng

    Abstract: Recent advances in Vision-Language-Action models (VLAs) have expanded the capabilities of embodied intelligence. However, significant challenges remain in real-time decision-making in complex 3D environments, which demand second-level responses, high-resolution perception, and tactical reasoning under dynamic conditions. To advance the field, we introduce CombatVLA, an efficient VLA model optimize… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  31. arXiv:2503.04446  [pdf, other

    cs.SI cs.MM

    SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity

    Authors: Yijie Xu, Bolun Zheng, Wei Zhu, Hangjia Pan, Yuchen Yao, Ning Xu, Anan Liu, Quan Zhang, Chenggang Yan

    Abstract: Social media popularity prediction task aims to predict the popularity of posts on social media platforms, which has a positive driving effect on application scenarios such as content optimization, digital marketing and online advertising. Though many studies have made significant progress, few of them pay much attention to the integration between popularity prediction with temporal alignment. In… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: accept by CVPR 2025

  32. arXiv:2503.03438  [pdf, other

    cs.LG

    Gradient Deconfliction via Orthogonal Projections onto Subspaces For Multi-task Learning

    Authors: Shijie Zhu, Hui Zhao, Tianshu Wu, Pengjie Wang, Hongbo Deng, Jian Xu, Bo Zheng

    Abstract: Although multi-task learning (MTL) has been a preferred approach and successfully applied in many real-world scenarios, MTL models are not guaranteed to outperform single-task models on all tasks mainly due to the negative effects of conflicting gradients among the tasks. In this paper, we fully examine the influence of conflicting gradients and further emphasize the importance and advantages of a… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

    Comments: WSDM 2025

  33. arXiv:2503.02918  [pdf, other

    cs.LG cs.AI

    Straight-Line Diffusion Model for Efficient 3D Molecular Generation

    Authors: Yuyan Ni, Shikun Feng, Haohan Chi, Bowen Zheng, Huan-ang Gao, Wei-Ying Ma, Zhi-Ming Ma, Yanyan Lan

    Abstract: Diffusion-based models have shown great promise in molecular generation but often require a large number of sampling steps to generate valid samples. In this paper, we introduce a novel Straight-Line Diffusion Model (SLDM) to tackle this problem, by formulating the diffusion process to follow a linear trajectory. The proposed process aligns well with the noise sensitivity characteristic of molecul… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

  34. arXiv:2503.00823  [pdf, other

    cs.CV

    Task-Agnostic Guided Feature Expansion for Class-Incremental Learning

    Authors: Bowen Zheng, Da-Wei Zhou, Han-Jia Ye, De-Chuan Zhan

    Abstract: The ability to learn new concepts while preserve the learned knowledge is desirable for learning systems in Class-Incremental Learning (CIL). Recently, feature expansion of the model become a prevalent solution for CIL, where the old features are fixed during the training of the new task while new features are expanded for the new tasks. However, such task-specific features learned from the new ta… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: Accepted to CVPR2025

  35. arXiv:2503.00747  [pdf, other

    cs.CV cs.RO eess.IV

    Unifying Light Field Perception with Field of Parallax

    Authors: Fei Teng, Buyin Deng, Boyuan Zheng, Kai Luo, Kunyu Peng, Jiaming Zhang, Kailun Yang

    Abstract: Field of Parallax (FoP)}, a spatial field that distills the common features from different LF representations to provide flexible and consistent support for multi-task learning. FoP is built upon three core features--projection difference, adjacency divergence, and contextual consistency--which are essential for cross-task adaptability. To implement FoP, we design a two-step angular adapter: the f… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

    Comments: The source code will be made publicly available at https://github.com/warriordby/LFX

  36. arXiv:2502.20196  [pdf, other

    cs.CL

    ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models

    Authors: Haibin Chen, Kangtao Lv, Chengwei Hu, Yanshi Li, Yujin Yuan, Yancheng He, Xingyao Zhang, Langming Liu, Shilei Liu, Wenbo Su, Bo Zheng

    Abstract: With the increasing use of Large Language Models (LLMs) in fields such as e-commerce, domain-specific concept evaluation benchmarks are crucial for assessing their domain capabilities. Existing LLMs may generate factually incorrect information within the complex e-commerce applications. Therefore, it is necessary to build an e-commerce concept benchmark. Existing benchmarks encounter two primary c… ▽ More

    Submitted 27 February, 2025; originally announced February 2025.

  37. arXiv:2502.19361  [pdf, other

    cs.CL

    Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning?

    Authors: Yancheng He, Shilong Li, Jiaheng Liu, Weixun Wang, Xingyuan Bu, Ge Zhang, Zhongyuan Peng, Zhaoxiang Zhang, Zhicheng Zheng, Wenbo Su, Bo Zheng

    Abstract: Recently, o1-like models have drawn significant attention, where these models produce the long Chain-of-Thought (CoT) reasoning steps to improve the reasoning abilities of existing Large Language Models (LLMs). In this paper, to understand the qualities of these long CoTs and measure the critique abilities of existing LLMs on these long CoTs, we introduce the DeltaBench, including the generated lo… ▽ More

    Submitted 30 March, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

    Comments: The first four authors contributed equally, 27 pages

  38. arXiv:2502.19178  [pdf, other

    cs.IR

    UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering

    Authors: Langming Liu, Shilei Liu, Yujin Yuan, Yizhen Zhang, Bencheng Yan, Zhiyuan Zeng, Zihao Wang, Jiaqi Liu, Di Wang, Wenbo Su, Pengjie Wang, Jian Xu, Bo Zheng

    Abstract: Large language models (LLMs) achieve remarkable success in natural language processing (NLP). In practical scenarios like recommendations, as users increasingly seek personalized experiences, it becomes crucial to incorporate user interaction history into the context of LLMs to enhance personalization. However, from a practical utility perspective, user interactions' extensive length and noise pre… ▽ More

    Submitted 1 April, 2025; v1 submitted 26 February, 2025; originally announced February 2025.

    Comments: 10 pages, 3 figures, 7 tables

  39. arXiv:2502.17787  [pdf, other

    cs.CL cs.AI

    AIR: Complex Instruction Generation via Automatic Iterative Refinement

    Authors: Wei Liu, Yancheng He, Hui Huang, Chengwei Hu, Jiaheng Liu, Shilong Li, Wenbo Su, Bo Zheng

    Abstract: With the development of large language models, their ability to follow simple instructions has significantly improved. However, adhering to complex instructions remains a major challenge. Current approaches to generating complex instructions are often irrelevant to the current instruction requirements or suffer from limited scalability and diversity. Moreover, methods such as back-translation, whi… ▽ More

    Submitted 27 February, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

    Comments: The first three authors contributed equally, 20 pages

  40. arXiv:2502.14744  [pdf, other

    cs.CL

    HiddenDetect: Detecting Jailbreak Attacks against Large Vision-Language Models via Monitoring Hidden States

    Authors: Yilei Jiang, Xinyan Gao, Tianshuo Peng, Yingshui Tan, Xiaoyong Zhu, Bo Zheng, Xiangyu Yue

    Abstract: The integration of additional modalities increases the susceptibility of large vision-language models (LVLMs) to safety risks, such as jailbreak attacks, compared to their language-only counterparts. While existing research primarily focuses on post-hoc alignment techniques, the underlying safety mechanisms within LVLMs remain largely unexplored. In this work , we investigate whether LVLMs inheren… ▽ More

    Submitted 21 April, 2025; v1 submitted 20 February, 2025; originally announced February 2025.

  41. arXiv:2502.11718  [pdf, other

    cs.CL cs.CV

    ChineseSimpleVQA -- "See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models

    Authors: Jihao Gu, Yingyao Wang, Pi Bu, Chen Wang, Ziming Wang, Tengtao Song, Donglai Wei, Jiale Yuan, Yingxiu Zhao, Yancheng He, Shilong Li, Jiaheng Liu, Meng Cao, Jun Song, Yingshui Tan, Xiang Li, Wenbo Su, Zhicheng Zheng, Xiaoyong Zhu, Bo Zheng

    Abstract: The evaluation of factual accuracy in large vision language models (LVLMs) has lagged behind their rapid development, making it challenging to fully reflect these models' knowledge capacity and reliability. In this paper, we introduce the first factuality-based visual question-answering benchmark in Chinese, named ChineseSimpleVQA, aimed at assessing the visual factuality of LVLMs across 8 major t… ▽ More

    Submitted 26 February, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: 24 pages, 21 figures

  42. arXiv:2502.11555  [pdf, other

    cs.AI

    Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models

    Authors: Yingshui Tan, Yilei Jiang, Yanshi Li, Jiaheng Liu, Xingyuan Bu, Wenbo Su, Xiangyu Yue, Xiaoyong Zhu, Bo Zheng

    Abstract: Fine-tuning large language models (LLMs) based on human preferences, commonly achieved through reinforcement learning from human feedback (RLHF), has been effective in improving their performance. However, maintaining LLM safety throughout the fine-tuning process remains a significant challenge, as resolving conflicts between safety and helpfulness can be non-trivial. Typically, the safety alignme… ▽ More

    Submitted 17 February, 2025; originally announced February 2025.

  43. arXiv:2502.08309  [pdf, other

    cs.IR

    Unlocking Scaling Law in Industrial Recommendation Systems with a Three-step Paradigm based Large User Model

    Authors: Bencheng Yan, Shilei Liu, Zhiyuan Zeng, Zihao Wang, Yizhen Zhang, Yujin Yuan, Langming Liu, Jiaqi Liu, Di Wang, Wenbo Su, Wang Pengjie, Jian Xu, Bo Zheng

    Abstract: Recent advancements in autoregressive Large Language Models (LLMs) have achieved significant milestones, largely attributed to their scalability, often referred to as the "scaling law". Inspired by these achievements, there has been a growing interest in adapting LLMs for Recommendation Systems (RecSys) by reformulating RecSys tasks into generative problems. However, these End-to-End Generative Re… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  44. arXiv:2502.05454  [pdf, other

    cs.RO cs.LG

    Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following

    Authors: Vivek Myers, Bill Chunyuan Zheng, Anca Dragan, Kuan Fang, Sergey Levine

    Abstract: Effective task representations should facilitate compositionality, such that after learning a variety of basic tasks, an agent can perform compound tasks consisting of multiple steps simply by composing the representations of the constituent steps together. While this is conceptually simple and appealing, it is not clear how to automatically learn representations that enable this sort of compositi… ▽ More

    Submitted 13 February, 2025; v1 submitted 8 February, 2025; originally announced February 2025.

  45. arXiv:2502.05187  [pdf, other

    cs.GT cs.LG

    An Adaptable Budget Planner for Enhancing Budget-Constrained Auto-Bidding in Online Advertising

    Authors: Zhijian Duan, Yusen Huo, Tianyu Wang, Zhilin Zhang, Yeshu Li, Chuan Yu, Jian Xu, Bo Zheng, Xiaotie Deng

    Abstract: In online advertising, advertisers commonly utilize auto-bidding services to bid for impression opportunities. A typical objective of the auto-bidder is to optimize the advertiser's cumulative value of winning impressions within specified budget constraints. However, such a problem is challenging due to the complex bidding environment faced by diverse advertisers. To address this challenge, we int… ▽ More

    Submitted 26 January, 2025; originally announced February 2025.

    Comments: In KDD 2025 ADS Track August

  46. arXiv:2502.04399  [pdf, other

    cs.LG cs.AI eess.SY

    Online Location Planning for AI-Defined Vehicles: Optimizing Joint Tasks of Order Serving and Spatio-Temporal Heterogeneous Model Fine-Tuning

    Authors: Bokeng Zheng, Bo Rao, Tianxiang Zhu, Chee Wei Tan, Jingpu Duan, Zhi Zhou, Xu Chen, Xiaoxi Zhang

    Abstract: Advances in artificial intelligence (AI) including foundation models (FMs), are increasingly transforming human society, with smart city driving the evolution of urban living.Meanwhile, vehicle crowdsensing (VCS) has emerged as a key enabler, leveraging vehicles' mobility and sensor-equipped capabilities. In particular, ride-hailing vehicles can effectively facilitate flexible data collection and… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

  47. arXiv:2502.03041  [pdf, other

    cs.IR cs.LG

    Large Language Models Are Universal Recommendation Learners

    Authors: Junguang Jiang, Yanwen Huang, Bin Liu, Xiaoyu Kong, Ziru Xu, Han Zhu, Jian Xu, Bo Zheng

    Abstract: In real-world recommender systems, different tasks are typically addressed using supervised learning on task-specific datasets with carefully designed model architectures. We demonstrate that large language models (LLMs) can function as universal recommendation learners, capable of handling multiple tasks within a unified input-output framework, eliminating the need for specialized model designs.… ▽ More

    Submitted 5 February, 2025; originally announced February 2025.

  48. Harmful Terms and Where to Find Them: Measuring and Modeling Unfavorable Financial Terms and Conditions in Shopping Websites at Scale

    Authors: Elisa Tsai, Neal Mangaokar, Boyuan Zheng, Haizhong Zheng, Atul Prakash

    Abstract: Terms and conditions for online shopping websites often contain terms that can have significant financial consequences for customers. Despite their impact, there is currently no comprehensive understanding of the types and potential risks associated with unfavorable financial terms. Furthermore, there are no publicly available detection systems or datasets to systematically identify or mitigate th… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

    Comments: This paper has been accepted to The Web Conference 2025 (WWW '25)

    ACM Class: H.3.3; K.4.1; K.4.2; I.2.7

  49. arXiv:2502.00321  [pdf, other

    cs.IR cs.AI

    MIM: Multi-modal Content Interest Modeling Paradigm for User Behavior Modeling

    Authors: Bencheng Yan, Si Chen, Shichang Jia, Jianyu Liu, Yueran Liu, Chenghan Fu, Wanxian Guan, Hui Zhao, Xiang Zhang, Kai Zhang, Wenbo Su, Pengjie Wang, Jian Xu, Bo Zheng, Baolin Liu

    Abstract: Click-Through Rate (CTR) prediction is a crucial task in recommendation systems, online searches, and advertising platforms, where accurately capturing users' real interests in content is essential for performance. However, existing methods heavily rely on ID embeddings, which fail to reflect users' true preferences for content such as images and titles. This limitation becomes particularly eviden… ▽ More

    Submitted 23 February, 2025; v1 submitted 1 February, 2025; originally announced February 2025.

  50. DAGPrompT: Pushing the Limits of Graph Prompting with a Distribution-aware Graph Prompt Tuning Approach

    Authors: Qin Chen, Liang Wang, Bo Zheng, Guojie Song

    Abstract: The pre-train then fine-tune approach has advanced GNNs by enabling general knowledge capture without task-specific labels. However, an objective gap between pre-training and downstream tasks limits its effectiveness. Recent graph prompting methods aim to close this gap through task reformulations and learnable prompts. Despite this, they struggle with complex graphs like heterophily graphs. Freez… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

    Comments: To be published in WWW '25, April 28-May 2, 2025, Sydney, NSW, Australia