Skip to main content

Showing 1–24 of 24 results for author: Lv, A

.
  1. arXiv:2505.22653  [pdf, ps, other

    cs.CL

    The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason

    Authors: Ang Lv, Ruobing Xie, Xingwu Sun, Zhanhui Kang, Rui Yan

    Abstract: Recent studies on post-training large language models (LLMs) for reasoning through reinforcement learning (RL) typically focus on tasks that can be accurately verified and rewarded, such as solving math problems. In contrast, our research investigates the impact of reward noise, a more practical consideration for real-world scenarios involving the post-training of LLMs using reward models. We foun… ▽ More

    Submitted 28 May, 2025; originally announced May 2025.

    Comments: Preprint

  2. arXiv:2505.19179  [pdf, ps, other

    cs.SD eess.AS eess.SP

    BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM

    Authors: Xun Gong, Anqi Lv, Zhiming Wang, Huijia Zhu, Yanmin Qian

    Abstract: While speech large language models (SpeechLLMs) have advanced standard automatic speech recognition (ASR), contextual biasing for named entities and rare words remains challenging, especially at scale. To address this, we propose BR-ASR: a Bias Retrieval framework for large-scale contextual biasing (up to 200k entries) via two innovations: (1) speech-and-bias contrastive learning to retrieve seman… ▽ More

    Submitted 25 May, 2025; originally announced May 2025.

    Comments: Accepted by InterSpeech 2025

  3. arXiv:2505.16401  [pdf, ps, other

    cs.LG

    Divide-Fuse-Conquer: Eliciting "Aha Moments" in Multi-Scenario Games

    Authors: Xiaoqing Zhang, Huabin Zheng, Ang Lv, Yuhan Liu, Zirui Song, Xiuying Chen, Rui Yan, Flood Sung

    Abstract: Large language models (LLMs) have been observed to suddenly exhibit advanced reasoning abilities during reinforcement learning (RL), resembling an ``aha moment'' triggered by simple outcome-based rewards. While RL has proven effective in eliciting such breakthroughs in tasks involving mathematics, coding, and vision, it faces significant challenges in multi-scenario games. The diversity of game ru… ▽ More

    Submitted 12 June, 2025; v1 submitted 22 May, 2025; originally announced May 2025.

    Comments: 25 pages, 13 figures, and 8 tables

  4. arXiv:2504.16074  [pdf, other

    cs.CL

    PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models

    Authors: Shi Qiu, Shaoyang Guo, Zhuo-Yang Song, Yunbo Sun, Zeyu Cai, Jiashen Wei, Tianyu Luo, Yixuan Yin, Haoxu Zhang, Yi Hu, Chenyang Wang, Chencheng Tang, Haoling Chang, Qi Liu, Ziheng Zhou, Tianyu Zhang, Jingtian Zhang, Zhangyi Liu, Minghao Li, Yuku Zhang, Boxuan Jing, Xianqi Yin, Yutong Ren, Zizhuo Fu, Jiaming Ji , et al. (29 additional authors not shown)

    Abstract: Current benchmarks for evaluating the reasoning capabilities of Large Language Models (LLMs) face significant limitations: task oversimplification, data contamination, and flawed evaluation items. These deficiencies necessitate more rigorous assessment methods. To address these limitations, we introduce PHYBench, a benchmark of 500 original physics problems ranging from high school to Physics Olym… ▽ More

    Submitted 18 May, 2025; v1 submitted 22 April, 2025; originally announced April 2025.

    Comments: 34 pages ,12 figures, 7 tables, latest update in 2025/05/18

  5. arXiv:2502.00527  [pdf, other

    cs.LG cs.CL

    PolarQuant: Leveraging Polar Transformation for Efficient Key Cache Quantization and Decoding Acceleration

    Authors: Songhao Wu, Ang Lv, Xiao Feng, Yufei Zhang, Xun Zhang, Guojun Yin, Wei Lin, Rui Yan

    Abstract: The KV cache in large language models is a dominant factor in memory usage, limiting their broader applicability. Quantizing the cache to lower bit widths is an effective way to reduce computational costs; however, previous methods struggle with quantizing key vectors due to outliers, resulting in excessive overhead. We propose a novel quantization approach called PolarQuant, which efficiently add… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

    Comments: preprint

  6. arXiv:2501.13074  [pdf, ps, other

    cs.CL cs.AI cs.LG

    Autonomy-of-Experts Models

    Authors: Ang Lv, Ruobing Xie, Yining Qian, Songhao Wu, Xingwu Sun, Zhanhui Kang, Di Wang, Rui Yan

    Abstract: Mixture-of-Experts (MoE) models mostly use a router to assign tokens to specific expert modules, activating only partial parameters and often outperforming dense models. We argue that the separation between the router's decision-making and the experts' execution is a critical yet overlooked issue, leading to suboptimal expert selection and ineffective learning. To address this, we propose Autonomy… ▽ More

    Submitted 29 May, 2025; v1 submitted 22 January, 2025; originally announced January 2025.

    Comments: Accepted by ICML 2025

  7. arXiv:2501.04070  [pdf, other

    cs.LG cs.AI cs.CL

    More is not always better? Enhancing Many-Shot In-Context Learning with Differentiated and Reweighting Objectives

    Authors: Xiaoqing Zhang, Ang Lv, Yuhan Liu, Flood Sung, Wei Liu, Jian Luan, Shuo Shang, Xiuying Chen, Rui Yan

    Abstract: Large language models (LLMs) excel at few-shot in-context learning (ICL) without requiring parameter updates. However, as ICL demonstrations increase from a few to many, performance tends to plateau and eventually decline. We identify two primary causes for this trend: the suboptimal negative log-likelihood (NLL) optimization objective and the incremental data noise. To address these issues, we in… ▽ More

    Submitted 27 May, 2025; v1 submitted 7 January, 2025; originally announced January 2025.

    Comments: 14 pages, 8 figures, 11 tables

  8. arXiv:2411.07176  [pdf, other

    cs.CL cs.AI cs.LG

    More Expressive Attention with Negative Weights

    Authors: Ang Lv, Ruobing Xie, Shuaipeng Li, Jiayi Liao, Xingwu Sun, Zhanhui Kang, Di Wang, Rui Yan

    Abstract: We propose a novel attention mechanism, named Cog Attention, that enables attention weights to be negative for enhanced expressiveness, which stems from two key factors: (1) Cog Attention enhances parameter flexibility. For example, unlike traditional softmax attention heads that use a static output-value (OV) matrix to delete or copy inputs that the heads attend to, Cog Attention naturally learns… ▽ More

    Submitted 30 January, 2025; v1 submitted 11 November, 2024; originally announced November 2024.

  9. arXiv:2410.21216  [pdf, other

    cs.CL cs.AI cs.LG

    HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation

    Authors: Yuhan Chen, Ang Lv, Jian Luan, Bin Wang, Wei Liu

    Abstract: Many positional encodings (PEs) are designed to exhibit long-term decay, based on an entrenched and long-standing inductive opinion: tokens farther away from the current position carry less relevant information. We argue that long-term decay is outdated in the era of LLMs, as LLMs are now applied to tasks demanding precise retrieval of in-context information from arbitrary positions. Firstly, we p… ▽ More

    Submitted 5 December, 2024; v1 submitted 28 October, 2024; originally announced October 2024.

  10. arXiv:2409.19745  [pdf, other

    cs.CL cs.AI

    PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead

    Authors: Tao Tan, Yining Qian, Ang Lv, Hongzhan Lin, Songhao Wu, Yongbo Wang, Feng Wang, Jingtong Wu, Xin Lu, Rui Yan

    Abstract: Large language models (LLMs) enhanced with retrieval-augmented generation (RAG) have introduced a new paradigm for web search. However, the limited context awareness of LLMs degrades their performance on RAG tasks. Existing methods to enhance context awareness are often inefficient, incurring time or memory overhead during inference, and many are tailored to specific position embeddings. In this p… ▽ More

    Submitted 7 October, 2024; v1 submitted 29 September, 2024; originally announced September 2024.

    Comments: preprint

  11. arXiv:2409.09281  [pdf, other

    cs.CL cs.AI cs.LG

    Language Models "Grok" to Copy

    Authors: Ang Lv, Ruobing Xie, Xingwu Sun, Zhanhui Kang, Rui Yan

    Abstract: We examine the pre-training dynamics of language models, focusing on their ability to copy text from preceding context--a fundamental skill for various LLM applications, including in-context learning (ICL) and retrieval-augmented generation (RAG). We propose a novel perspective that Transformer-based language models develop copying abilities similarly to grokking, which refers to sudden generaliza… ▽ More

    Submitted 5 February, 2025; v1 submitted 13 September, 2024; originally announced September 2024.

    Comments: NAACL 2025 main conference, short paper

  12. arXiv:2407.06677  [pdf, other

    cs.CL

    Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules

    Authors: Zhuocheng Gong, Ang Lv, Jian Guan, Junxi Yan, Wei Wu, Huishuai Zhang, Minlie Huang, Dongyan Zhao, Rui Yan

    Abstract: Is it always necessary to compute tokens from shallow to deep layers in Transformers? The continued success of vanilla Transformers and their variants suggests an undoubted "yes". In this work, however, we attempt to break the depth-ordered convention by proposing a novel architecture dubbed mixture-of-modules (MoM), which is motivated by an intuition that any layer, regardless of its position, ca… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  13. arXiv:2406.19598  [pdf, other

    cs.CL

    Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

    Authors: Hongzhan Lin, Ang Lv, Yuhan Chen, Chen Zhu, Yang Song, Hengshu Zhu, Rui Yan

    Abstract: Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions. Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging. In this paper, for LLMs uti… ▽ More

    Submitted 16 October, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted by Neurips2024

  14. arXiv:2406.18920  [pdf, other

    astro-ph.GA astro-ph.CO

    Cloud Crushing and Dissipation of Uniformly-Driven Adiabatic Turbulence in Circumgalactic Media

    Authors: Alex Lv, Lile Wang, Renyue Cen, Luis C. Ho

    Abstract: The circumgalactic medium (CGM) is responsive to kinetic disruptions generated by nearby astrophysical events. In this work, we study the saturation and dissipation of turbulent hydrodynamics within the CGM through an extensive array of 252 numerical simulations with a large parameter space. These simulations are endowed with proper cooling mechanisms to consistently explore the parameter space sp… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 24 pages, 20 figures

  15. arXiv:2404.00586  [pdf, other

    cs.AI

    RLGNet: Repeating-Local-Global History Network for Temporal Knowledge Graph Reasoning

    Authors: Ao Lv, Guige Ouyang, Yongzhong Huang, Yue Chen, Haoran Xie

    Abstract: Temporal Knowledge Graph (TKG) reasoning involves predicting future events based on historical information. However, due to the unpredictability of future events, this task is highly challenging. To address this issue, we propose a multi-scale hybrid architecture model based on ensemble learning, called RLGNet (Repeating-Local-Global History Network). Inspired by the application of multi-scale inf… ▽ More

    Submitted 28 July, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

  16. arXiv:2403.19521  [pdf, other

    cs.CL cs.AI cs.LG

    Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models

    Authors: Ang Lv, Yuhan Chen, Kaiyi Zhang, Yulong Wang, Lifeng Liu, Ji-Rong Wen, Jian Xie, Rui Yan

    Abstract: In this paper, we delve into several mechanisms employed by Transformer-based language models (LLMs) for factual recall tasks. We outline a pipeline consisting of three major steps: (1) Given a prompt ``The capital of France is,'' task-specific attention heads extract the topic token, such as ``France,'' from the context and pass it to subsequent MLPs. (2) As attention heads' outputs are aggregate… ▽ More

    Submitted 24 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  17. arXiv:2403.02178  [pdf, other

    cs.CL cs.AI cs.LG

    Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models

    Authors: Changyu Chen, Xiting Wang, Ting-En Lin, Ang Lv, Yuchuan Wu, Xin Gao, Ji-Rong Wen, Rui Yan, Yongbin Li

    Abstract: In reasoning tasks, even a minor error can cascade into inaccurate results, leading to suboptimal performance of large language models in such domains. Earlier fine-tuning approaches sought to mitigate this by leveraging more precise supervisory signals from human labeling, larger models, or self-sampling, although at a high cost. Conversely, we develop a method that avoids external resources, rel… ▽ More

    Submitted 10 July, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

    Comments: Accepted by ACL 2024

  18. arXiv:2401.06469  [pdf, other

    cs.LG cs.CL

    Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning

    Authors: Kaiyi Zhang, Ang Lv, Yuhan Chen, Hansen Ha, Tao Xu, Rui Yan

    Abstract: In this paper, by treating in-context learning (ICL) as a meta-optimization process, we explain why LLMs are sensitive to the order of ICL examples. This understanding leads us to the development of Batch-ICL, an effective, efficient, and order-agnostic inference algorithm for ICL. Differing from the standard N-shot learning approach, Batch-ICL employs $N$ separate 1-shot forward computations and… ▽ More

    Submitted 5 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

    Comments: This paper has been accepted by ACL 2024 (Findings)

  19. arXiv:2312.04455  [pdf, other

    cs.CL cs.AI cs.LG

    Fortify the Shortest Stave in Attention: Enhancing Context Awareness of Large Language Models for Effective Tool Use

    Authors: Yuhan Chen, Ang Lv, Ting-En Lin, Changyu Chen, Yuchuan Wu, Fei Huang, Yongbin Li, Rui Yan

    Abstract: In this paper, we demonstrate that an inherent waveform pattern in the attention allocation of large language models (LLMs) significantly affects their performance in tasks demanding a high degree of context awareness, such as utilizing LLMs for tool-use. Specifically, the crucial information in the context will be potentially overlooked by model when it is positioned in the trough zone of the att… ▽ More

    Submitted 4 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: ACL 2024 main

  20. arXiv:2311.07468  [pdf, other

    cs.CL cs.AI cs.LG

    An Analysis and Mitigation of the Reversal Curse

    Authors: Ang Lv, Kaiyi Zhang, Shufang Xie, Quan Tu, Yuhan Chen, Ji-Rong Wen, Rui Yan

    Abstract: Recent research observed a noteworthy phenomenon in large language models (LLMs), referred to as the ``reversal curse.'' The reversal curse is that when dealing with two entities, denoted as $a$ and $b$, connected by their relation $R$ and its inverse $R^{-1}$, LLMs excel in handling sequences in the form of ``$aRb$,'' but encounter challenges when processing ``$bR^{-1}a$,'' whether in generation… ▽ More

    Submitted 10 November, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: Accepted by EMNLP 2024 Main. This paper was originally titled "Are We Falling into a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse." The title was revised during the submission to EMNLP, and we are now updating the title for this preprint version

  21. VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

    Authors: Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun, Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu, Aiguo Lv , et al. (17 additional authors not shown)

    Abstract: We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassifi… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

    Journal ref: The latest VisionFM work has been published in NEJM AI, 2024

  22. arXiv:2306.16770  [pdf, other

    cs.CL cs.AI

    DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data Augmentation in Multi-Turn Conversations

    Authors: Ang Lv, Jinpeng Li, Yuhan Chen, Xing Gao, Ji Zhang, Rui Yan

    Abstract: In open-domain dialogue generation tasks, contexts and responses in most datasets are one-to-one mapped, violating an important many-to-many characteristic: a context leads to various responses, and a response answers multiple contexts. Without such patterns, models poorly generalize and prefer responding safely. Many attempts have been made in either multi-turn settings from a one-to-many perspec… ▽ More

    Submitted 29 June, 2023; originally announced June 2023.

    Comments: ACL 2023 main

  23. arXiv:2305.10841  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    GETMusic: Generating Any Music Tracks with a Unified Representation and Diffusion Framework

    Authors: Ang Lv, Xu Tan, Peiling Lu, Wei Ye, Shikun Zhang, Jiang Bian, Rui Yan

    Abstract: Symbolic music generation aims to create musical notes, which can help users compose music, such as generating target instrument tracks based on provided source tracks. In practical scenarios where there's a predefined ensemble of tracks and various composition needs, an efficient and effective generative model that can generate any target tracks based on the other tracks becomes crucial. However,… ▽ More

    Submitted 29 September, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: 13 pages, 4 figures

  24. arXiv:2208.05697  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation

    Authors: Ang Lv, Xu Tan, Tao Qin, Tie-Yan Liu, Rui Yan

    Abstract: Lyric-to-melody generation is an important task in songwriting, and is also quite challenging due to its unique characteristics: the generated melodies should not only follow good musical patterns, but also align with features in lyrics such as rhythms and structures. These characteristics cannot be well handled by neural generation models that learn lyric-to-melody mapping in an end-to-end way, d… ▽ More

    Submitted 28 January, 2023; v1 submitted 11 August, 2022; originally announced August 2022.