Skip to main content

Showing 1–50 of 309 results for author: Yin, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.04072  [pdf, ps, other

    cs.IR

    CTR-Guided Generative Query Suggestion in Conversational Search

    Authors: Erxue Min, Hsiu-Yuan Huang, Xihong Yang, Min Yang, Xin Jia, Yunfang Wu, Hengyi Cai, Junfeng Wang, Shuaiqiang Wang, Dawei Yin

    Abstract: Generating effective query suggestions in conversational search requires aligning model outputs with user preferences, which is challenging due to sparse and noisy click signals. We propose GQS, a generative framework that integrates click modeling and preference optimization to enhance real-world user engagement. GQS consists of three key components: (1) a Multi-Source CTR Modeling module that ca… ▽ More

    Submitted 5 July, 2025; originally announced July 2025.

  2. arXiv:2507.01006  [pdf, ps, other

    cs.CV cs.AI cs.LG

    GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

    Authors: GLM-V Team, :, Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi, Junhui Ji, Lihang Pan, Shuaiqi Duan, Weihan Wang, Yan Wang, Yean Cheng, Zehai He, Zhe Su, Zhen Yang, Ziyang Pan, Aohan Zeng, Baoxu Wang, Boyan Shi, Changyu Pang, Chenhui Zhang , et al. (54 additional authors not shown)

    Abstract: We present GLM-4.1V-Thinking, a vision-language model (VLM) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the fi… ▽ More

    Submitted 2 July, 2025; v1 submitted 1 July, 2025; originally announced July 2025.

  3. arXiv:2506.17881  [pdf, ps, other

    cs.CL cs.AI

    Multi-turn Jailbreaking via Global Refinement and Active Fabrication

    Authors: Hua Tang, Lingyong Yan, Yukun Zhao, Shuaiqiang Wang, Jizhou Huang, Dawei Yin

    Abstract: Large Language Models (LLMs) have achieved exceptional performance across a wide range of tasks. However, they still pose significant safety risks due to the potential misuse for malicious purposes. Jailbreaks, which aim to elicit models to generate harmful content, play a critical role in identifying the underlying security threats. Recent jailbreaking primarily focuses on single-turn scenarios,… ▽ More

    Submitted 21 June, 2025; originally announced June 2025.

  4. arXiv:2506.17188  [pdf, ps, other

    cs.CL cs.AI cs.IR

    Towards AI Search Paradigm

    Authors: Yuchen Li, Hengyi Cai, Rui Kong, Xinran Chen, Jiamin Chen, Jun Yang, Haojie Zhang, Jiayi Li, Jiayi Wu, Yiqun Chen, Changle Qu, Keyi Kong, Wenwen Ye, Lixin Su, Xinyu Ma, Long Xia, Daiting Shi, Jiashu Zhao, Haoyi Xiong, Shuaiqiang Wang, Dawei Yin

    Abstract: In this paper, we introduce the AI Search Paradigm, a comprehensive blueprint for next-generation search systems capable of emulating human information processing and decision-making. The paradigm employs a modular architecture of four LLM-powered agents (Master, Planner, Executor and Writer) that dynamically adapt to the full spectrum of information needs, from simple factual queries to complex m… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  5. arXiv:2506.15677  [pdf, ps, other

    cs.AI cs.CL cs.CV cs.MM cs.RO

    Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence

    Authors: Yining Hong, Rui Sun, Bingxuan Li, Xingcheng Yao, Maxine Wu, Alexander Chien, Da Yin, Ying Nian Wu, Zhecan James Wang, Kai-Wei Chang

    Abstract: AI agents today are mostly siloed - they either retrieve and reason over vast amount of digital information and knowledge obtained online; or interact with the physical world through embodied perception, planning and action - but rarely both. This separation limits their ability to solve tasks that require integrated physical and digital intelligence, such as cooking from online recipes, navigatin… ▽ More

    Submitted 19 June, 2025; v1 submitted 18 June, 2025; originally announced June 2025.

  6. arXiv:2506.09544  [pdf, ps, other

    cs.LG

    STOAT: Spatial-Temporal Probabilistic Causal Inference Network

    Authors: Yang Yang, Du Yin, Hao Xue, Flora Salim

    Abstract: Spatial-temporal causal time series (STC-TS) involve region-specific temporal observations driven by causally relevant covariates and interconnected across geographic or network-based spaces. Existing methods often model spatial and temporal dynamics independently and overlook causality-driven probabilistic forecasting, limiting their predictive power. To address this, we propose STOAT (Spatial-Te… ▽ More

    Submitted 12 June, 2025; v1 submitted 11 June, 2025; originally announced June 2025.

  7. arXiv:2506.08626  [pdf, ps, other

    cs.IR

    Leveraging LLMs to Evaluate Usefulness of Document

    Authors: Xingzhu Wang, Erhan Zhang, Yiqun Chen, Jinghan Xuan, Yucheng Hou, Yitong Xu, Ying Nie, Shuaiqiang Wang, Dawei Yin, Jiaxin Mao

    Abstract: The conventional Cranfield paradigm struggles to effectively capture user satisfaction due to its weak correlation between relevance and satisfaction, alongside the high costs of relevance annotation in building test collections. To tackle these issues, our research explores the potential of leveraging large language models (LLMs) to generate multilevel usefulness labels for evaluation. We introdu… ▽ More

    Submitted 10 June, 2025; v1 submitted 10 June, 2025; originally announced June 2025.

  8. arXiv:2506.07905  [pdf, ps, other

    cs.CV

    WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning

    Authors: Jie Yang, Feipeng Ma, Zitian Wang, Dacheng Yin, Kang Rong, Fengyun Rao, Ruimao Zhang

    Abstract: Building on the success of text-based reasoning models like DeepSeek-R1, extending these capabilities to multimodal reasoning holds great promise. While recent works have attempted to adapt DeepSeek-R1-style reinforcement learning (RL) training paradigms to multimodal large language models (MLLM), focusing on domain-specific tasks like math and visual perception, a critical question remains: How c… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

  9. arXiv:2506.04039  [pdf, ps, other

    cs.CV cs.AI cs.CL

    Mitigating Hallucinations in Large Vision-Language Models via Entity-Centric Multimodal Preference Optimization

    Authors: Jiulong Wu, Zhengliang Shi, Shuaiqiang Wang, Jizhou Huang, Dawei Yin, Lingyong Yan, Min Cao, Min Zhang

    Abstract: Large Visual Language Models (LVLMs) have demonstrated impressive capabilities across multiple tasks. However, their trustworthiness is often challenged by hallucinations, which can be attributed to the modality misalignment and the inherent hallucinations of their underlying Large Language Models (LLMs) backbone. Existing preference alignment methods focus on aligning model responses with human p… ▽ More

    Submitted 4 June, 2025; originally announced June 2025.

  10. arXiv:2506.02404  [pdf, ps, other

    cs.CL cs.AI

    GraphRAG-Bench: Challenging Domain-Specific Reasoning for Evaluating Graph Retrieval-Augmented Generation

    Authors: Yilin Xiao, Junnan Dong, Chuang Zhou, Su Dong, Qian-wen Zhang, Di Yin, Xing Sun, Xiao Huang

    Abstract: Graph Retrieval Augmented Generation (GraphRAG) has garnered increasing recognition for its potential to enhance large language models (LLMs) by structurally organizing domain-specific corpora and facilitating complex reasoning. However, current evaluations of GraphRAG models predominantly rely on traditional question-answering datasets. Their limited scope in questions and evaluation metrics fail… ▽ More

    Submitted 19 June, 2025; v1 submitted 2 June, 2025; originally announced June 2025.

  11. arXiv:2505.24251  [pdf, other

    cs.CL cs.IR

    Proactive Guidance of Multi-Turn Conversation in Industrial Search

    Authors: Xiaoyu Li, Xiao Li, Li Gao, Yiding Liu, Xiaoyang Wang, Shuaiqiang Wang, Junfeng Wang, Dawei Yin

    Abstract: The evolution of Large Language Models (LLMs) has significantly advanced multi-turn conversation systems, emphasizing the need for proactive guidance to enhance users' interactions. However, these systems face challenges in dynamically adapting to shifts in users' goals and maintaining low latency for real-time interactions. In the Baidu Search AI assistant, an industrial-scale multi-turn search s… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: ACL'25 (Industry)

  12. arXiv:2505.21413  [pdf, ps, other

    cs.CL cs.AI

    RefTool: Enhancing Model Reasoning with Reference-Guided Tool Creation

    Authors: Xiao Liu, Da Yin, Zirui Wu, Yansong Feng

    Abstract: Tools enhance the reasoning capabilities of large language models (LLMs) in complex problem-solving tasks, but not all tasks have available tools. In the absence of predefined tools, prior works have explored instructing LLMs to generate tools on their own. However, such approaches rely heavily on the models' internal knowledge and would fail in domains beyond the LLMs' knowledge scope. To address… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

    Comments: Code is available at https://github.com/xxxiaol/RefTool

  13. arXiv:2505.20128  [pdf, other

    cs.CL

    Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers

    Authors: Zhengliang Shi, Lingyong Yan, Dawei Yin, Suzan Verberne, Maarten de Rijke, Zhaochun Ren

    Abstract: Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques. However, effectively enabling LLMs to seek accurate knowledge in complex tasks remains a challenge due to the complexity of multi-hop queries as well as the irrelevant retrieved content. To address these limitations, we propose EXSEARCH, an agentic search framework, where the LLM… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

    Comments: Working in process

  14. arXiv:2505.15467  [pdf, ps, other

    cs.CL cs.AI

    Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning

    Authors: Yukun Zhao, Lingyong Yan, Zhenyang Li, Shuaiqiang Wang, Zhumin Chen, Zhaochun Ren, Dawei Yin

    Abstract: Large language models have achieved remarkable success in various tasks. However, it is challenging for them to learn new tasks incrementally due to catastrophic forgetting. Existing approaches rely on experience replay, optimization constraints, or task differentiation, which encounter strict limitations in real-world scenarios. To address these issues, we propose Joint Flashback Adaptation. We f… ▽ More

    Submitted 21 May, 2025; originally announced May 2025.

  15. arXiv:2505.07057  [pdf, ps, other

    cs.CV

    DAPE: Dual-Stage Parameter-Efficient Fine-Tuning for Consistent Video Editing with Diffusion Models

    Authors: Junhao Xia, Chaoyang Zhang, Yecheng Zhang, Chengyang Zhou, Zhichang Wang, Bochun Liu, Dongshuo Yin

    Abstract: Video generation based on diffusion models presents a challenging multimodal task, with video editing emerging as a pivotal direction in this field. Recent video editing approaches primarily fall into two categories: training-required and training-free methods. While training-based methods incur high computational costs, training-free alternatives often yield suboptimal performance. To address the… ▽ More

    Submitted 11 May, 2025; originally announced May 2025.

  16. arXiv:2505.03075  [pdf, other

    cs.IR

    Direct Retrieval-augmented Optimization: Synergizing Knowledge Selection and Language Models

    Authors: Zhengliang Shi, Lingyong Yan, Weiwei Sun, Yue Feng, Pengjie Ren, Xinyu Ma, Shuaiqiang Wang, Dawei Yin, Maarten de Rijke, Zhaochun Ren

    Abstract: Retrieval-augmented generation (RAG) integrates large language models ( LLM s) with retrievers to access external knowledge, improving the factuality of LLM generation in knowledge-grounded tasks. To optimize the RAG performance, most previous work independently fine-tunes the retriever to adapt to frozen LLM s or trains the LLMs to use documents retrieved by off-the-shelf retrievers, lacking end-… ▽ More

    Submitted 5 May, 2025; originally announced May 2025.

  17. arXiv:2504.17519  [pdf, other

    cs.IR

    Replication and Exploration of Generative Retrieval over Dynamic Corpora

    Authors: Zhen Zhang, Xinyu Ma, Weiwei Sun, Pengjie Ren, Zhumin Chen, Shuaiqiang Wang, Dawei Yin, Maarten de Rijke, Zhaochun Ren

    Abstract: Generative retrieval (GR) has emerged as a promising paradigm in information retrieval (IR). However, most existing GR models are developed and evaluated using a static document collection, and their performance in dynamic corpora where document collections evolve continuously is rarely studied. In this paper, we first reproduce and systematically evaluate various representative GR approaches over… ▽ More

    Submitted 24 April, 2025; originally announced April 2025.

    Comments: Accepted at SIGIR 2025 (Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval)

  18. arXiv:2504.13517  [pdf, other

    cs.AI

    Optimizing Electric Vehicle Charging Station Locations: A Data-driven System with Multi-source Fusion

    Authors: Lihuan Li, Du Yin, Hao Xue, David Lillo-Trynes, Flora Salim

    Abstract: With the growing electric vehicles (EVs) charging demand, urban planners face the challenges of providing charging infrastructure at optimal locations. For example, range anxiety during long-distance travel and the inadequate distribution of residential charging stations are the major issues many cities face. To achieve reasonable estimation and deployment of the charging demand, we develop a data… ▽ More

    Submitted 18 April, 2025; originally announced April 2025.

    Comments: 4-page short paper

  19. arXiv:2504.10208  [pdf, ps, other

    cs.IR cs.LG

    From Prompting to Alignment: A Generative Framework for Query Recommendation

    Authors: Erxue Min, Hsiu-Yuan Huang, Xihong Yang, Min Yang, Xin Jia, Yunfang Wu, Hengyi Cai, Junfeng Wang, Shuaiqiang Wang, Dawei Yin

    Abstract: In modern search systems, search engines often suggest relevant queries to users through various panels or components, helping refine their information needs. Traditionally, these recommendations heavily rely on historical search logs to build models, which suffer from cold-start or long-tail issues. Furthermore, tasks such as query suggestion, completion or clarification are studied separately by… ▽ More

    Submitted 5 July, 2025; v1 submitted 14 April, 2025; originally announced April 2025.

  20. arXiv:2504.08694  [pdf, other

    cs.CL

    TP-RAG: Benchmarking Retrieval-Augmented Large Language Model Agents for Spatiotemporal-Aware Travel Planning

    Authors: Hang Ni, Fan Liu, Xinyu Ma, Lixin Su, Shuaiqiang Wang, Dawei Yin, Hui Xiong, Hao Liu

    Abstract: Large language models (LLMs) have shown promise in automating travel planning, yet they often fall short in addressing nuanced spatiotemporal rationality. While existing benchmarks focus on basic plan validity, they neglect critical aspects such as route efficiency, POI appeal, and real-time adaptability. This paper introduces TP-RAG, the first benchmark tailored for retrieval-augmented, spatiotem… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

  21. arXiv:2504.05607  [pdf, other

    cs.CL cs.AI

    FactGuard: Leveraging Multi-Agent Systems to Generate Answerable and Unanswerable Questions for Enhanced Long-Context LLM Extraction

    Authors: Qian-Wen Zhang, Fang Li, Jie Wang, Lingfeng Qiao, Yifei Yu, Di Yin, Xing Sun

    Abstract: Extractive reading comprehension systems are designed to locate the correct answer to a question within a given text. However, a persistent challenge lies in ensuring these models maintain high accuracy in answering questions while reliably recognizing unanswerable queries. Despite significant advances in large language models (LLMs) for reading comprehension, this issue remains critical, particul… ▽ More

    Submitted 7 April, 2025; originally announced April 2025.

  22. arXiv:2504.05220  [pdf, other

    cs.IR cs.AI cs.CL

    Leveraging LLMs for Utility-Focused Annotation: Reducing Manual Effort for Retrieval and RAG

    Authors: Hengran Zhang, Minghao Tang, Keping Bi, Jiafeng Guo, Shihao Liu, Daiting Shi, Dawei Yin, Xueqi Cheng

    Abstract: Retrieval models typically rely on costly human-labeled query-document relevance annotations for training and evaluation. To reduce this cost and leverage the potential of Large Language Models (LLMs) in relevance judgments, we aim to explore whether LLM-generated annotations can effectively replace human annotations in training retrieval models. Retrieval usually emphasizes relevance, which indic… ▽ More

    Submitted 7 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

    Comments: 12 pages, 4 figures

  23. arXiv:2504.05216  [pdf, other

    cs.IR cs.AI cs.CL

    Unleashing the Power of LLMs in Dense Retrieval with Query Likelihood Modeling

    Authors: Hengran Zhang, Keping Bi, Jiafeng Guo, Xiaojie Sun, Shihao Liu, Daiting Shi, Dawei Yin, Xueqi Cheng

    Abstract: Dense retrieval is a crucial task in Information Retrieval (IR) and is the foundation for downstream tasks such as re-ranking. Recently, large language models (LLMs) have shown compelling semantic understanding capabilities and are appealing to researchers studying dense retrieval. LLMs, as decoder-style generative models, are competent at language generation while falling short on modeling global… ▽ More

    Submitted 19 April, 2025; v1 submitted 7 April, 2025; originally announced April 2025.

    Comments: 12 pages, 3 figures

  24. arXiv:2504.04713  [pdf, other

    cs.CL cs.IR

    Sequential-NIAH: A Needle-In-A-Haystack Benchmark for Extracting Sequential Needles from Long Contexts

    Authors: Yifei Yu, Qian-Wen Zhang, Lingfeng Qiao, Di Yin, Fang Li, Jie Wang, Zengxi Chen, Suncong Zheng, Xiaolong Liang, Xing Sun

    Abstract: Evaluating the ability of large language models (LLMs) to handle extended contexts is critical, particularly for retrieving information relevant to specific queries embedded within lengthy inputs. We introduce Sequential-NIAH, a benchmark specifically designed to evaluate the capability of LLMs to extract sequential information items (known as needles) from long contexts. The benchmark comprises t… ▽ More

    Submitted 9 April, 2025; v1 submitted 6 April, 2025; originally announced April 2025.

  25. arXiv:2504.03624  [pdf, other

    cs.CL cs.AI cs.LG

    Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

    Authors: NVIDIA, :, Aaron Blakeman, Aarti Basant, Abhinav Khattar, Adithya Renduchintala, Akhiad Bercovich, Aleksander Ficek, Alexis Bjorlin, Ali Taghibakhshi, Amala Sanjay Deshmukh, Ameya Sunil Mahabaleshwarkar, Andrew Tao, Anna Shors, Ashwath Aithal, Ashwin Poojary, Ayush Dattagupta, Balaram Buddharaju, Bobby Chen, Boris Ginsburg, Boxin Wang, Brandon Norick, Brian Butterfield, Bryan Catanzaro, Carlo del Mundo , et al. (176 additional authors not shown)

    Abstract: As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transf… ▽ More

    Submitted 15 April, 2025; v1 submitted 4 April, 2025; originally announced April 2025.

  26. arXiv:2503.23427  [pdf, other

    cs.CL cs.IR

    CoRanking: Collaborative Ranking with Small and Large Ranking Agents

    Authors: Wenhan Liu, Xinyu Ma, Yutao Zhu, Lixin Su, Shuaiqiang Wang, Dawei Yin, Zhicheng Dou

    Abstract: Large Language Models (LLMs) have demonstrated superior listwise ranking performance. However, their superior performance often relies on large-scale parameters (\eg, GPT-4) and a repetitive sliding window process, which introduces significant efficiency challenges. In this paper, we propose \textbf{CoRanking}, a novel collaborative ranking framework that combines small and large ranking models fo… ▽ More

    Submitted 31 March, 2025; v1 submitted 30 March, 2025; originally announced March 2025.

  27. arXiv:2503.17928  [pdf, other

    cs.CV cs.CL

    Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization

    Authors: Zefeng Zhang, Hengzhu Tang, Jiawei Sheng, Zhenyu Zhang, Yiming Ren, Zhenyang Li, Dawei Yin, Duohe Ma, Tingwen Liu

    Abstract: Multimodal Large Language Models excel in various tasks, yet often struggle with modality bias, where the model tends to rely heavily on a single modality and overlook critical information in other modalities, which leads to incorrect focus and generating irrelevant responses. In this paper, we propose using the paradigm of preference optimization to solve the modality bias problem, including RLAI… ▽ More

    Submitted 23 March, 2025; originally announced March 2025.

    Comments: CVPR 2025

  28. arXiv:2503.10615  [pdf, other

    cs.CV

    R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization

    Authors: Yi Yang, Xiaoxuan He, Hongkun Pan, Xiyan Jiang, Yan Deng, Xingtao Yang, Haoyu Lu, Dacheng Yin, Fengyun Rao, Minfeng Zhu, Bo Zhang, Wei Chen

    Abstract: Large Language Models have demonstrated remarkable reasoning capability in complex textual tasks. However, multimodal reasoning, which requires integrating visual and textual information, remains a significant challenge. Existing visual-language models often struggle to effectively analyze and reason visual content, resulting in suboptimal performance on complex reasoning tasks. Moreover, the abse… ▽ More

    Submitted 18 March, 2025; v1 submitted 13 March, 2025; originally announced March 2025.

    Comments: Code and Model: https://github.com/Fancy-MLLM/R1-onevision

  29. arXiv:2503.09382  [pdf, other

    cs.IR cs.AI

    Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with LLMs

    Authors: Jiani Huang, Shijie Wang, Liang-bo Ning, Wenqi Fan, Shuaiqiang Wang, Dawei Yin, Qing Li

    Abstract: Recommender systems (RecSys) are widely used across various modern digital platforms and have garnered significant attention. Traditional recommender systems usually focus only on fixed and simple recommendation scenarios, making it difficult to generalize to new and unseen recommendation tasks in an interactive paradigm. Recently, the advancement of large language models (LLMs) has revolutionized… ▽ More

    Submitted 12 March, 2025; originally announced March 2025.

  30. arXiv:2503.04772  [pdf, other

    cs.LO cs.AI

    Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs

    Authors: David Yin, Jing Gao

    Abstract: Large Language Models (LLMs) have demonstrated significant potential in generating mathematical proofs. However, a persistent challenge is that LLMs occasionally make mistakes, while even a minor mistake can invalidate an entire proof. Proof assistants like Lean offer a great remedy. They are designed for verifying each step of a proof in a formal language, and in recent years researchers have cre… ▽ More

    Submitted 16 February, 2025; originally announced March 2025.

  31. arXiv:2503.01763  [pdf, other

    cs.CL cs.AI cs.IR

    Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models

    Authors: Zhengliang Shi, Yuhan Wang, Lingyong Yan, Pengjie Ren, Shuaiqiang Wang, Dawei Yin, Zhaochun Ren

    Abstract: Tool learning aims to augment large language models (LLMs) with diverse tools, enabling them to act as agents for solving practical tasks. Due to the limited context length of tool-using LLMs, adopting information retrieval (IR) models to select useful tools from large toolsets is a critical initial step. However, the performance of IR models in tool retrieval tasks remains underexplored and uncle… ▽ More

    Submitted 26 May, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

    Comments: ACL 2025. Code: https://github.com/mangopy/tool-retrieval-benchmark

  32. arXiv:2502.18353  [pdf, other

    cs.CL cs.LG

    DBR: Divergence-Based Regularization for Debiasing Natural Language Understanding Models

    Authors: Zihao Li, Ruixiang Tang, Lu Cheng, Shuaiqiang Wang, Dawei Yin, Mengnan Du

    Abstract: Pre-trained language models (PLMs) have achieved impressive results on various natural language processing tasks. However, recent research has revealed that these models often rely on superficial features and shortcuts instead of developing a genuine understanding of language, especially for natural language understanding (NLU) tasks. Consequently, the models struggle to generalize to out-of-domai… ▽ More

    Submitted 25 February, 2025; originally announced February 2025.

    Comments: Accepted by SIGKDD Explorations

  33. arXiv:2502.15693  [pdf, other

    cs.IR cs.AI cs.LG

    Hgformer: Hyperbolic Graph Transformer for Recommendation

    Authors: Xin Yang, Xingrun Li, Heng Chang, Jinze Yang, Xihong Yang, Shengyu Tao, Ningkang Chang, Maiko Shigeno, Junfeng Wang, Dawei Yin, Erxue Min

    Abstract: The cold start problem is a challenging problem faced by most modern recommender systems. By leveraging knowledge from other domains, cross-domain recommendation can be an effective method to alleviate the cold start problem. However, the modelling distortion for long-tail data, which is widely present in recommender systems, is often overlooked in cross-domain recommendation. In this research, we… ▽ More

    Submitted 30 December, 2024; originally announced February 2025.

  34. arXiv:2502.12970  [pdf, ps, other

    cs.CL

    Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking

    Authors: Junda Zhu, Lingyong Yan, Shuaiqiang Wang, Dawei Yin, Lei Sha

    Abstract: Large Reasoning Models (LRMs) have demonstrated impressive performances across diverse domains. However, how safety of Large Language Models (LLMs) benefits from enhanced reasoning capabilities against jailbreak queries remains unexplored. To bridge this gap, in this paper, we propose Reasoning-to-Defend (R2D), a novel training paradigm that integrates a safety-aware reasoning mechanism into LLMs'… ▽ More

    Submitted 29 May, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

    Comments: 18 pages

  35. Unbiased Learning to Rank with Query-Level Click Propensity Estimation: Beyond Pointwise Observation and Relevance

    Authors: Lulu Yu, Keping Bi, Jiafeng Guo, Shihao Liu, Dawei Yin, Xueqi Cheng

    Abstract: Most existing unbiased learning-to-rank (ULTR) approaches are based on the user examination hypothesis, which assumes that users will click a result only if it is both relevant and observed (typically modeled by position). However, in real-world scenarios, users often click only one or two results after examining multiple relevant options, due to limited patience or because their information needs… ▽ More

    Submitted 18 February, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

    Comments: 5 pages, 3 figures, accepted by The ACM Web Conference (WWW) 2025 Short Paper Track

  36. arXiv:2502.11387  [pdf, other

    cs.CL

    RoleMRC: A Fine-Grained Composite Benchmark for Role-Playing and Instruction-Following

    Authors: Junru Lu, Jiazheng Li, Guodong Shen, Lin Gui, Siyu An, Yulan He, Di Yin, Xing Sun

    Abstract: Role-playing is important for Large Language Models (LLMs) to follow diverse instructions while maintaining role identity and the role's pre-defined ability limits. Existing role-playing datasets mostly contribute to controlling role style and knowledge boundaries, but overlook role-playing in instruction-following scenarios. We introduce a fine-grained role-playing and instruction-following compo… ▽ More

    Submitted 16 February, 2025; originally announced February 2025.

  37. arXiv:2502.11177  [pdf, ps, other

    cs.CL

    The Mirage of Model Editing: Revisiting Evaluation in the Wild

    Authors: Wanli Yang, Fei Sun, Jiajun Tan, Xinyu Ma, Qi Cao, Dawei Yin, Huawei Shen, Xueqi Cheng

    Abstract: Despite near-perfect results reported in the literature, the effectiveness of model editing in real-world applications remains unclear. To bridge this gap, we introduce QAEdit, a new benchmark aligned with widely used question answering (QA) datasets, and WILD, a task-agnostic evaluation framework designed to better reflect real-world usage of model editing. Our single editing experiments show tha… ▽ More

    Submitted 31 May, 2025; v1 submitted 16 February, 2025; originally announced February 2025.

    Comments: Accepted to ACL 2025 Main Conference (Camera Ready Version)

  38. arXiv:2502.08346  [pdf, other

    cs.IR cs.AI cs.LG

    Graph Foundation Models for Recommendation: A Comprehensive Survey

    Authors: Bin Wu, Yihang Wang, Yuanhao Zeng, Jiawei Liu, Jiashu Zhao, Cheng Yang, Yawen Li, Long Xia, Dawei Yin, Chuan Shi

    Abstract: Recommender systems (RS) serve as a fundamental tool for navigating the vast expanse of online information, with deep learning advancements playing an increasingly important role in improving ranking accuracy. Among these, graph neural networks (GNNs) excel at extracting higher-order structural information, while large language models (LLMs) are designed to process and comprehend natural language,… ▽ More

    Submitted 16 February, 2025; v1 submitted 12 February, 2025; originally announced February 2025.

  39. arXiv:2502.05924  [pdf, other

    cs.CV cs.IR

    Multi-Branch Collaborative Learning Network for Video Quality Assessment in Industrial Video Search

    Authors: Hengzhu Tang, Zefeng Zhang, Zhiping Li, Zhenyu Zhang, Xing Wu, Li Gao, Suqi Cheng, Dawei Yin

    Abstract: Video Quality Assessment (VQA) is vital for large-scale video retrieval systems, aimed at identifying quality issues to prioritize high-quality videos. In industrial systems, low-quality video characteristics fall into four categories: visual-related issues like mosaics and black boxes, textual issues from video titles and OCR content, and semantic issues like frame incoherence and frame-text mism… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

    Comments: KDD 2025 ADS

  40. arXiv:2502.05690  [pdf, other

    cs.AI econ.GN

    Managing Geological Uncertainty in Critical Mineral Supply Chains: A POMDP Approach with Application to U.S. Lithium Resources

    Authors: Mansur Arief, Yasmine Alonso, CJ Oshiro, William Xu, Anthony Corso, David Zhen Yin, Jef K. Caers, Mykel J. Kochenderfer

    Abstract: The world is entering an unprecedented period of critical mineral demand, driven by the global transition to renewable energy technologies and electric vehicles. This transition presents unique challenges in mineral resource development, particularly due to geological uncertainty-a key characteristic that traditional supply chain optimization approaches do not adequately address. To tackle this ch… ▽ More

    Submitted 8 February, 2025; originally announced February 2025.

  41. arXiv:2502.02584  [pdf, other

    cs.LG cs.AI

    QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search

    Authors: Zongyu Lin, Yao Tang, Xingcheng Yao, Da Yin, Ziniu Hu, Yizhou Sun, Kai-Wei Chang

    Abstract: Language agents have become a promising solution to complex interactive tasks. One of the key ingredients to the success of language agents is the reward model on the trajectory of the agentic workflow, which provides valuable guidance during training or inference. However, due to the lack of annotations of intermediate interactions, most existing works use an outcome reward model to optimize poli… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

  42. arXiv:2502.01549  [pdf, other

    cs.IR cs.AI cs.CV

    VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos

    Authors: Xubin Ren, Lingrui Xu, Long Xia, Shuaiqiang Wang, Dawei Yin, Chao Huang

    Abstract: Retrieval-Augmented Generation (RAG) has demonstrated remarkable success in enhancing Large Language Models (LLMs) through external knowledge integration, yet its application has primarily focused on textual content, leaving the rich domain of multi-modal video knowledge predominantly unexplored. This paper introduces VideoRAG, the first retrieval-augmented generation framework specifically design… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  43. arXiv:2501.15228  [pdf, other

    cs.CL cs.IR

    Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning

    Authors: Yiqun Chen, Lingyong Yan, Weiwei Sun, Xinyu Ma, Yi Zhang, Shuaiqiang Wang, Dawei Yin, Yiming Yang, Jiaxin Mao

    Abstract: Retrieval-augmented generation (RAG) is extensively utilized to incorporate external, current knowledge into large language models, thereby minimizing hallucinations. A standard RAG pipeline may comprise several components, such as query rewriting, document retrieval, document filtering, and answer generation. However, these components are typically optimized separately through supervised fine-tun… ▽ More

    Submitted 25 January, 2025; originally announced January 2025.

  44. arXiv:2501.12432  [pdf, other

    cs.LG cs.AI cs.CL

    Divide-Then-Aggregate: An Efficient Tool Learning Method via Parallel Tool Invocation

    Authors: Dongsheng Zhu, Weixian Shi, Zhengliang Shi, Zhaochun Ren, Shuaiqiang Wang, Lingyong Yan, Dawei Yin

    Abstract: Although current Large Language Models (LLMs) exhibit impressive capabilities, performing complex real-world tasks still requires tool learning. Mainstream methods, such as CoT/ReAct, rely on step-by-step tool invocation to interact with external environments, but they are limited in perceptual scope and lack adequate task-planning capability. To address these limitations, other studies introduce… ▽ More

    Submitted 25 May, 2025; v1 submitted 21 January, 2025; originally announced January 2025.

    Comments: Accepted to ACL 2025

  45. arXiv:2501.11671  [pdf, other

    cs.IR

    Exploring Preference-Guided Diffusion Model for Cross-Domain Recommendation

    Authors: Xiaodong Li, Hengzhu Tang, Jiawei Sheng, Xinghua Zhang, Li Gao, Suqi Cheng, Dawei Yin, Tingwen Liu

    Abstract: Cross-domain recommendation (CDR) has been proven as a promising way to alleviate the cold-start issue, in which the most critical problem is how to draw an informative user representation in the target domain via the transfer of user preference existing in the source domain. Prior efforts mostly follow the embedding-and-mapping paradigm, which first integrate the preference into user representati… ▽ More

    Submitted 20 January, 2025; originally announced January 2025.

    Comments: This paper is accepted by KDD'2025

  46. arXiv:2501.11034  [pdf, other

    cs.IR

    Generative Retrieval for Book search

    Authors: Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Shihao Liu, Shuaiqing Wang, Dawei Yin, Xueqi Cheng

    Abstract: In book search, relevant book information should be returned in response to a query. Books contain complex, multi-faceted information such as metadata, outlines, and main text, where the outline provides hierarchical information between chapters and sections. Generative retrieval (GR) is a new retrieval paradigm that consolidates corpus information into a single model to generate identifiers of do… ▽ More

    Submitted 19 January, 2025; originally announced January 2025.

    Comments: Accepted at KDD ADS 2025

  47. arXiv:2501.02226  [pdf, other

    cs.IR

    Knowledge Graph Retrieval-Augmented Generation for LLM-based Recommendation

    Authors: Shijie Wang, Wenqi Fan, Yue Feng, Shanru Lin, Xinyu Ma, Shuaiqiang Wang, Dawei Yin

    Abstract: Recommender systems have become increasingly vital in our daily lives, helping to alleviate the problem of information overload across various user-oriented online services. The emergence of Large Language Models (LLMs) has yielded remarkable achievements, demonstrating their potential for the development of next-generation recommender systems. Despite these advancements, LLM-based recommender sys… ▽ More

    Submitted 28 May, 2025; v1 submitted 4 January, 2025; originally announced January 2025.

    Comments: Accepted by ACL 2025 main conference

  48. arXiv:2412.17847  [pdf, other

    cs.AI cs.CL cs.CY cs.LG cs.MM

    Bridging the Data Provenance Gap Across Text, Speech and Video

    Authors: Shayne Longpre, Nikhil Singh, Manuel Cherep, Kushagra Tiwary, Joanna Materzynska, William Brannon, Robert Mahari, Naana Obeng-Marnu, Manan Dey, Mohammed Hamdy, Nayan Saxena, Ahmad Mustafa Anis, Emad A. Alghamdi, Vu Minh Chien, Da Yin, Kun Qian, Yizhi Li, Minnie Liang, An Dinh, Shrestha Mohanty, Deividas Mataciunas, Tobin South, Jianguo Zhang, Ariel N. Lee, Campbell S. Lund , et al. (18 additional authors not shown)

    Abstract: Progress in AI is driven largely by the scale and quality of training data. Despite this, there is a deficit of empirical analysis examining the attributes of well-established datasets beyond text. In this work we conduct the largest and first-of-its-kind longitudinal audit across modalities--popular text, speech, and video datasets--from their detailed sourcing trends and use restrictions to thei… ▽ More

    Submitted 18 February, 2025; v1 submitted 18 December, 2024; originally announced December 2024.

    Comments: ICLR 2025. 10 pages, 5 figures (main paper)

  49. arXiv:2412.14574  [pdf, other

    cs.IR cs.CL

    Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models

    Authors: Wenhan Liu, Xinyu Ma, Yutao Zhu, Ziliang Zhao, Shuaiqiang Wang, Dawei Yin, Zhicheng Dou

    Abstract: Large Language Models (LLMs) have shown exciting performance in listwise passage ranking. Due to the limited input length, existing methods often adopt the sliding window strategy. Such a strategy, though effective, is inefficient as it involves repetitive and serialized processing, which usually re-evaluates relevant passages multiple times. As a result, it incurs redundant API costs, which are p… ▽ More

    Submitted 19 December, 2024; originally announced December 2024.

    Comments: 14 pages

  50. arXiv:2412.14510  [pdf, other

    cs.CL cs.AI

    PA-RAG: RAG Alignment via Multi-Perspective Preference Optimization

    Authors: Jiayi Wu, Hengyi Cai, Lingyong Yan, Hao Sun, Xiang Li, Shuaiqiang Wang, Dawei Yin, Ming Gao

    Abstract: The emergence of Retrieval-augmented generation (RAG) has alleviated the issues of outdated and hallucinatory content in the generation of large language models (LLMs), yet it still reveals numerous limitations. When a general-purpose LLM serves as the RAG generator, it often suffers from inadequate response informativeness, response robustness, and citation quality. Past approaches to tackle thes… ▽ More

    Submitted 18 December, 2024; originally announced December 2024.