Skip to main content

Showing 1–50 of 235 results for author: Yuan, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2509.26182  [pdf, ps, other

    cs.DC

    Parallax: Efficient LLM Inference Service over Decentralized Environment

    Authors: Chris Tong, Youhe Jiang, Gufeng Chen, Tianyi Zhao, Sibian Lu, Wenjie Qu, Eric Yang, Lynn Ai, Binhang Yuan

    Abstract: Deploying a large language model (LLM) inference service remains costly because centralized serving depends on specialized GPU clusters and high-bandwidth interconnects in datacenters. An appealing alternative is to leverage collaborative decentralized GPU pools. However, heterogeneity in GPU and limited interconnected network bandwidth, along with potentially dynamic availability, make efficient… ▽ More

    Submitted 30 September, 2025; originally announced September 2025.

  2. Expanding Horizons of Level Diversity via Multi-objective Evolutionary Learning

    Authors: Qingquan Zhang, Ziqi Wang, Yuchen Li, Keyuan Zhang, Bo Yuan, Jialin Liu

    Abstract: In recent years, the generation of diverse game levels has gained increasing interest, contributing to a richer and more engaging gaming experience. A number of level diversity metrics have been proposed in literature, which are naturally multi-dimensional, leading to conflicted, complementary, or both relationships among these dimensions. However, existing level generation approaches often fail t… ▽ More

    Submitted 29 September, 2025; originally announced September 2025.

    Comments: 12 pages,6 figures

    Journal ref: IEEE Transactions on Artificial Intelligence (2024)

  3. arXiv:2509.23727  [pdf, ps, other

    cs.SD cs.AI

    AudioMoG: Guiding Audio Generation with Mixture-of-Guidance

    Authors: Junyou Wang, Zehua Chen, Binjie Yuan, Kaiwen Zheng, Chang Li, Yuxuan Jiang, Jun Zhu

    Abstract: Guidance methods have demonstrated significant improvements in cross-modal audio generation, including text-to-audio (T2A) and video-to-audio (V2A) generation. The popularly adopted method, classifier-free guidance (CFG), steers generation by emphasizing condition alignment, enhancing fidelity but often at the cost of diversity. Recently, autoguidance (AG) has been explored for audio generation, e… ▽ More

    Submitted 28 September, 2025; originally announced September 2025.

  4. arXiv:2509.23061  [pdf, ps, other

    cs.PL cs.AI

    Local Success Does Not Compose: Benchmarking Large Language Models for Compositional Formal Verification

    Authors: Xu Xu, Xin Li, Xingwei Qu, Jie Fu, Binhang Yuan

    Abstract: We introduce DafnyCOMP, a benchmark for evaluating large language models (LLMs) on compositional specification generation in Dafny. Unlike prior benchmarks that focus on single-function tasks, DafnyCOMP targets programs composed of multiple interacting functions with data dependencies, requiring reasoning across component boundaries. The benchmark consists of 300 automatically synthesized multi-fu… ▽ More

    Submitted 26 September, 2025; originally announced September 2025.

  5. arXiv:2509.20146  [pdf, ps, other

    cs.CV cs.AI

    EchoBench: Benchmarking Sycophancy in Medical Large Vision-Language Models

    Authors: Botai Yuan, Yutian Zhou, Yingjie Wang, Fushuo Huo, Yongcheng Jing, Li Shen, Ying Wei, Zhiqi Shen, Ziwei Liu, Tianwei Zhang, Jie Yang, Dacheng Tao

    Abstract: Recent benchmarks for medical Large Vision-Language Models (LVLMs) emphasize leaderboard accuracy, overlooking reliability and safety. We study sycophancy -- models' tendency to uncritically echo user-provided information -- in high-stakes clinical settings. We introduce EchoBench, a benchmark to systematically evaluate sycophancy in medical LVLMs. It contains 2,122 images across 18 departments an… ▽ More

    Submitted 24 September, 2025; originally announced September 2025.

    Comments: 29 pages, 6 figures

  6. arXiv:2509.15940  [pdf, ps, other

    cs.DC

    Efficient Pre-Training of LLMs via Topology-Aware Communication Alignment on More Than 9600 GPUs

    Authors: Guoliang He, Youhe Jiang, Wencong Xiao, Kaihua Jiang, Shuguang Wang, Jun Wang, Zixian Du, Zhuo Jiang, Xinlei Zhang, Binhang Yuan, Eiko Yoneki

    Abstract: The scaling law for large language models (LLMs) depicts that the path towards machine intelligence necessitates training at large scale. Thus, companies continuously build large-scale GPU clusters, and launch training jobs that span over thousands of computing nodes. However, LLM pre-training presents unique challenges due to its complex communication patterns, where GPUs exchange data in sparse… ▽ More

    Submitted 19 September, 2025; originally announced September 2025.

    Comments: NeurIPS 2025

  7. arXiv:2509.12508  [pdf, ps, other

    cs.CL cs.AI cs.SD eess.AS

    FunAudio-ASR Technical Report

    Authors: Keyu An, Yanni Chen, Chong Deng, Changfeng Gao, Zhifu Gao, Bo Gong, Xiangang Li, Yabin Li, Xiang Lv, Yunjie Ji, Yiheng Jiang, Bin Ma, Haoneng Luo, Chongjia Ni, Zexu Pan, Yiping Peng, Zhendong Peng, Peiyao Wang, Hao Wang, Wen Wang, Wupeng Wang, Biao Tian, Zhentao Tan, Nan Yang, Bin Yuan , et al. (7 additional authors not shown)

    Abstract: In recent years, automatic speech recognition (ASR) has witnessed transformative advancements driven by three complementary paradigms: data scaling, model size scaling, and deep integration with large language models (LLMs). However, LLMs are prone to hallucination, which can significantly degrade user experience in real-world ASR applications. In this paper, we present FunAudio-ASR, a large-scale… ▽ More

    Submitted 17 September, 2025; v1 submitted 15 September, 2025; originally announced September 2025.

    Comments: Authors are listed in alphabetical order

  8. arXiv:2509.05346  [pdf

    cs.AI

    Benchmarking Large Language Models for Personalized Guidance in AI-Enhanced Learning

    Authors: Bo Yuan, Jiazi Hu

    Abstract: While Large Language Models (LLMs) are increasingly envisioned as intelligent assistants for personalized learning, systematic head-to-head evaluations within authentic learning scenarios remain limited. This study conducts an empirical comparison of three state-of-the-art LLMs on a tutoring task that simulates a realistic learning setting. Using a dataset comprising a student's answers to ten que… ▽ More

    Submitted 2 September, 2025; originally announced September 2025.

  9. arXiv:2509.02591  [pdf, ps, other

    eess.IV cs.AI cs.CV

    Ensemble of Pathology Foundation Models for MIDOG 2025 Track 2: Atypical Mitosis Classification

    Authors: Mieko Ochi, Bae Yuan

    Abstract: Mitotic figures are classified into typical and atypical variants, with atypical counts correlating strongly with tumor aggressiveness. Accurate differentiation is therefore essential for patient prognostication and resource allocation, yet remains challenging even for expert pathologists. Here, we leveraged Pathology Foundation Models (PFMs) pre-trained on large histopathology datasets and applie… ▽ More

    Submitted 18 September, 2025; v1 submitted 28 August, 2025; originally announced September 2025.

  10. arXiv:2509.00761  [pdf, ps, other

    cs.AI cs.CL

    L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search

    Authors: Ziqi Wang, Boqin Yuan

    Abstract: We present L-MARS (Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search), a system that reduces hallucination and uncertainty in legal question answering through coordinated multi-agent reasoning and retrieval. Unlike single-pass retrieval-augmented generation (RAG), L-MARS decomposes queries into subproblems, issues targeted searches across heterogeneous sources (Serper web,… ▽ More

    Submitted 2 September, 2025; v1 submitted 31 August, 2025; originally announced September 2025.

  11. arXiv:2508.18224  [pdf, ps, other

    cs.DC cs.LG

    Flash Sparse Attention: An Alternative Efficient Implementation of Native Sparse Attention Kernel

    Authors: Ran Yan, Youhe Jiang, Binhang Yuan

    Abstract: Recent progress in sparse attention mechanisms has demonstrated strong potential for reducing the computational cost of long-context training and inference in large language models (LLMs). Native Sparse Attention (NSA), a state-of-the-art approach, introduces natively trainable, hardware-aligned sparse attention that delivers substantial system-level performance gains while maintaining accuracy co… ▽ More

    Submitted 25 August, 2025; originally announced August 2025.

  12. Ethical Considerations of Large Language Models in Game Playing

    Authors: Qingquan Zhang, Yuchen Li, Bo Yuan, Julian Togelius, Georgios N. Yannakakis, Jialin Liu

    Abstract: Large language models (LLMs) have demonstrated tremendous potential in game playing, while little attention has been paid to their ethical implications in those contexts. This work investigates and analyses the ethical considerations of applying LLMs in game playing, using Werewolf, also known as Mafia, as a case study. Gender bias, which affects game fairness and player experience, has been obser… ▽ More

    Submitted 21 August, 2025; originally announced August 2025.

    Comments: 19 pages

    Journal ref: Frontiers of Computer Science (2025)

  13. arXiv:2508.15387  [pdf, ps, other

    cs.CV

    DIO: Refining Mutual Information and Causal Chain to Enhance Machine Abstract Reasoning Ability

    Authors: Ruizhuo Song, Beiming Yuan

    Abstract: Despite the outstanding performance of current deep learning models across various domains, their fundamental bottleneck in abstract reasoning remains unresolved. To address this challenge, the academic community has introduced Raven's Progressive Matrices (RPM) problems as an authoritative benchmark for evaluating the abstract reasoning capabilities of deep learning algorithms, with a focus on co… ▽ More

    Submitted 4 September, 2025; v1 submitted 21 August, 2025; originally announced August 2025.

    Comments: 15 pages, 9 figures, 8 tables

  14. arXiv:2508.15126  [pdf, ps, other

    cs.AI cs.CL

    aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists

    Authors: Pengsong Zhang, Xiang Hu, Guowei Huang, Yang Qi, Heng Zhang, Xiuxu Li, Jiaxing Song, Jiabin Luo, Yijiang Li, Shuo Yin, Chengxiao Dai, Eric Hanchen Jiang, Xiaoyan Zhou, Zhenfei Yin, Boqin Yuan, Jing Dong, Guinan Su, Guanren Qiao, Haiming Tang, Anghong Du, Lili Pan, Zhenzhong Lan, Xinyu Liu

    Abstract: Recent advances in large language models (LLMs) have enabled AI agents to autonomously generate scientific proposals, conduct experiments, author papers, and perform peer reviews. Yet this flood of AI-generated research content collides with a fragmented and largely closed publication ecosystem. Traditional journals and conferences rely on human peer review, making them difficult to scale and ofte… ▽ More

    Submitted 20 August, 2025; originally announced August 2025.

    Comments: Preprint under review. Code is available at https://github.com/aixiv-org. Website is available at https://forms.gle/DxQgCtXFsJ4paMtn8

  15. arXiv:2508.07101  [pdf, ps, other

    cs.CL cs.AI

    Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

    Authors: Lijie Yang, Zhihao Zhang, Arti Jain, Shijie Cao, Baihong Yuan, Yiwei Chen, Zhihao Jia, Ravi Netravali

    Abstract: Large reasoning models achieve strong performance through test-time scaling but incur substantial computational overhead, particularly from excessive token generation when processing short input prompts. While sparse attention mechanisms can reduce latency and memory usage, existing approaches suffer from significant accuracy degradation due to accumulated errors during long-generation reasoning.… ▽ More

    Submitted 9 August, 2025; originally announced August 2025.

  16. arXiv:2508.05298  [pdf, ps, other

    cs.RO

    GhostShell: Streaming LLM Function Calls for Concurrent Embodied Programming

    Authors: Jian Gong, Youwei Huang, Bo Yuan, Ming Zhu, Zhou Liao, Jianhang Liang, Juncheng Zhan, Jinke Wang, Hang Shu, Mingyue Xiong, Yanjun Ye, Yufan Zu, Yang Zhou, Yihan Ding, Xuannian Chen, Xingyu Lu, Runjie Ban, Bingchao Huang, Fusen Liu

    Abstract: We present GhostShell, a novel approach that leverages Large Language Models (LLMs) to enable streaming and concurrent behavioral programming for embodied systems. In contrast to conventional methods that rely on pre-scheduled action sequences or behavior trees, GhostShell drives embodied systems to act on-the-fly by issuing function calls incrementally as tokens are streamed from the LLM. GhostSh… ▽ More

    Submitted 8 August, 2025; v1 submitted 7 August, 2025; originally announced August 2025.

    Comments: 17 pages, 5 figures, conference

  17. arXiv:2507.16331  [pdf, ps, other

    cs.CL

    Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny

    Authors: Chuanhao Yan, Fengdi Che, Xuhan Huang, Xu Xu, Xin Li, Yizhi Li, Xingwei Qu, Jingzhe Shi, Zhuangzhuang He, Chenghua Lin, Yaodong Yang, Binhang Yuan, Hang Zhao, Yu Qiao, Bowen Zhou, Jie Fu

    Abstract: Existing informal language-based (e.g., human language) Large Language Models (LLMs) trained with Reinforcement Learning (RL) face a significant challenge: their verification processes, which provide crucial training signals, are neither reliable nor scalable. In fact, the prevalent large proprietary models could hardly generate verifiable programs. A promising yet largely uncharted alternative is… ▽ More

    Submitted 25 July, 2025; v1 submitted 22 July, 2025; originally announced July 2025.

  18. arXiv:2507.14266  [pdf

    cs.CY cs.AI

    Bridging MOOCs, Smart Teaching, and AI: A Decade of Evolution Toward a Unified Pedagogy

    Authors: Bo Yuan, Jiazi Hu

    Abstract: Over the past decade, higher education has evolved through three distinct paradigms: the emergence of Massive Open Online Courses (MOOCs), the integration of Smart Teaching technologies into classrooms, and the rise of AI-enhanced learning. Each paradigm is intended to address specific challenges in traditional education: MOOCs enable ubiquitous access to learning resources; Smart Teaching support… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

  19. arXiv:2507.13814  [pdf, ps, other

    cs.MA

    CodeEdu: A Multi-Agent Collaborative Platform for Personalized Coding Education

    Authors: Jianing Zhao, Peng Gao, Jiannong Cao, Zhiyuan Wen, Chen Chen, Jianing Yin, Ruosong Yang, Bo Yuan

    Abstract: Large Language Models (LLMs) have demonstrated considerable potential in improving coding education by providing support for code writing, explanation, and debugging. However, existing LLM-based approaches generally fail to assess students' abilities, design learning plans, provide personalized material aligned with individual learning goals, and enable interactive learning. Current work mostly us… ▽ More

    Submitted 18 July, 2025; originally announced July 2025.

    Comments: 4 pages, 4 figures. Demo video available at: https://youtu.be/9iIVmTT4CVk

  20. arXiv:2507.03341  [pdf, ps, other

    eess.IV cs.CV physics.med-ph

    UltraDfeGAN: Detail-Enhancing Generative Adversarial Networks for High-Fidelity Functional Ultrasound Synthesis

    Authors: Zhuo Li, Xuhang Chen, Shuqiang Wang, Bin Yuan, Nou Sotheany, Ngeth Rithea

    Abstract: Functional ultrasound (fUS) is a neuroimaging technique known for its high spatiotemporal resolution, enabling non-invasive observation of brain activity through neurovascular coupling. Despite its potential in clinical applications such as neonatal monitoring and intraoperative guidance, the development of fUS faces challenges related to data scarcity and limitations in generating realistic fUS i… ▽ More

    Submitted 19 August, 2025; v1 submitted 4 July, 2025; originally announced July 2025.

  21. arXiv:2506.16795  [pdf, ps, other

    cs.NE cs.AI

    Robust Dynamic Material Handling via Adaptive Constrained Evolutionary Reinforcement Learning

    Authors: Chengpeng Hu, Ziming Wang, Bo Yuan, Jialin Liu, Chengqi Zhang, Xin Yao

    Abstract: Dynamic material handling (DMH) involves the assignment of dynamically arriving material transporting tasks to suitable vehicles in real time for minimising makespan and tardiness. In real-world scenarios, historical task records are usually available, which enables the training of a decision policy on multiple instances consisting of historical records. Recently, reinforcement learning has been a… ▽ More

    Submitted 20 June, 2025; originally announced June 2025.

  22. arXiv:2506.11697  [pdf, ps, other

    cs.SE

    SoK: Automated Vulnerability Repair: Methods, Tools, and Assessments

    Authors: Yiwei Hu, Zhen Li, Kedie Shu, Shenghua Guan, Deqing Zou, Shouhuai Xu, Bin Yuan, Hai Jin

    Abstract: The increasing complexity of software has led to the steady growth of vulnerabilities. Vulnerability repair investigates how to fix software vulnerabilities. Manual vulnerability repair is labor-intensive and time-consuming because it relies on human experts, highlighting the importance of Automated Vulnerability Repair (AVR). In this SoK, we present the systematization of AVR methods through the… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

    Comments: The full version of "SoK: Automated Vulnerability Repair: Methods, Tools, and Assessments" accepted by the 34th USENIX Security Symposium (USENIX Security 2025)

  23. arXiv:2506.07235  [pdf, ps, other

    cs.CV cs.CL

    Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification

    Authors: Tianyi Bai, Zengjie Hu, Fupeng Sun, Jiantao Qiu, Yizhen Jiang, Guangxin He, Bohan Zeng, Conghui He, Binhang Yuan, Wentao Zhang

    Abstract: Multi-modal large language models (MLLMs) have achieved remarkable capabilities by integrating visual perception with language understanding, enabling applications such as image-grounded dialogue, visual question answering, and scientific analysis. However, most MLLMs adopt a static inference paradigm, encoding the entire image into fixed visual tokens upfront, which limits their ability to iterat… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  24. arXiv:2506.07227  [pdf, ps, other

    cs.CV cs.CL

    Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning

    Authors: Tianyi Bai, Yuxuan Fan, Jiantao Qiu, Fupeng Sun, Jiayi Song, Junlin Han, Zichen Liu, Conghui He, Wentao Zhang, Binhang Yuan

    Abstract: Multimodal large language models (MLLMs) have achieved strong performance on vision-language tasks but still struggle with fine-grained visual differences, leading to hallucinations or missed semantic shifts. We attribute this to limitations in both training data and learning objectives. To address these issues, we propose a controlled data generation pipeline that produces minimally edited image… ▽ More

    Submitted 8 June, 2025; originally announced June 2025.

  25. arXiv:2506.06084  [pdf, other

    cs.CV

    WisWheat: A Three-Tiered Vision-Language Dataset for Wheat Management

    Authors: Bowen Yuan, Selena Song, Javier Fernandez, Yadan Luo, Mahsa Baktashmotlagh, Zijian Wang

    Abstract: Wheat management strategies play a critical role in determining yield. Traditional management decisions often rely on labour-intensive expert inspections, which are expensive, subjective and difficult to scale. Recently, Vision-Language Models (VLMs) have emerged as a promising solution to enable scalable, data-driven management support. However, due to a lack of domain-specific knowledge, directl… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  26. arXiv:2506.04203  [pdf, ps, other

    cs.DC

    Cascadia: An Efficient Cascade Serving System for Large Language Models

    Authors: Youhe Jiang, Fangcheng Fu, Wanru Zhao, Stephan Rabanser, Jintao Zhang, Nicholas D. Lane, Binhang Yuan

    Abstract: Recent advances in large language models (LLMs) have intensified the need to deliver both rapid responses and high-quality outputs. More powerful models yield better results but incur higher inference latency, whereas smaller models are faster yet less capable. Recent work proposes balancing this latency-quality trade-off using model cascades, which route simpler queries to smaller models and more… ▽ More

    Submitted 29 September, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

  27. arXiv:2506.01970  [pdf, ps, other

    cs.LG cs.CV

    Johnny: Structuring Representation Space to Enhance Machine Abstract Reasoning Ability

    Authors: Ruizhuo Song, Beiming Yuan

    Abstract: This paper thoroughly investigates the challenges of enhancing AI's abstract reasoning capabilities, with a particular focus on Raven's Progressive Matrices (RPM) tasks involving complex human-like concepts. Firstly, it dissects the empirical reality that traditional end-to-end RPM-solving models heavily rely on option pool configurations, highlighting that this dependency constrains the model's r… ▽ More

    Submitted 13 May, 2025; originally announced June 2025.

    Comments: 15 pages, 15 figures, 5 tables

  28. arXiv:2506.01352  [pdf, other

    cs.LG

    TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network

    Authors: Guangxin He, Yuan Cao, Yutong He, Tianyi Bai, Kun Yuan, Binhang Yuan

    Abstract: Decentralized training of large language models offers the opportunity to pool computational resources across geographically distributed participants but faces significant network communication bottlenecks, particularly in pipeline-parallel settings. While pipeline parallelism partitions model layers across devices to handle large-scale models, it necessitates frequent communication of intermediat… ▽ More

    Submitted 2 June, 2025; originally announced June 2025.

  29. arXiv:2505.24298  [pdf, ps, other

    cs.LG cs.AI

    AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

    Authors: Wei Fu, Jiaxuan Gao, Xujie Shen, Chen Zhu, Zhiyu Mei, Chuyi He, Shusheng Xu, Guo Wei, Jun Mei, Jiashu Wang, Tongkai Yang, Binhang Yuan, Yi Wu

    Abstract: Reinforcement learning (RL) has become a dominant paradigm for training large language models (LLMs), particularly for reasoning tasks. Effective RL for LLMs requires massive parallelization and poses an urgent need for efficient training systems. Most existing large-scale RL systems for LLMs are synchronous, alternating generation and training in a batch setting where rollouts in each training ba… ▽ More

    Submitted 12 September, 2025; v1 submitted 30 May, 2025; originally announced May 2025.

  30. arXiv:2505.18924  [pdf, ps, other

    cs.CV

    LLM-Guided Taxonomy and Hierarchical Uncertainty for 3D Point Cloud Active Learning

    Authors: Chenxi Li, Nuo Chen, Fengyun Tan, Yantong Chen, Bochun Yuan, Tianrui Li, Chongshou Li

    Abstract: We present a novel active learning framework for 3D point cloud semantic segmentation that, for the first time, integrates large language models (LLMs) to construct hierarchical label structures and guide uncertainty-based sample selection. Unlike prior methods that treat labels as flat and independent, our approach leverages LLM prompting to automatically generate multi-level semantic taxonomies… ▽ More

    Submitted 3 June, 2025; v1 submitted 24 May, 2025; originally announced May 2025.

  31. arXiv:2505.05286  [pdf, ps, other

    cs.DB

    HEXGEN-TEXT2SQL: Optimizing LLM Inference Request Scheduling for Agentic Text-to-SQL Workflow

    Authors: You Peng, Youhe Jiang, Chen Wang, Binhang Yuan

    Abstract: Recent advances in leveraging the agentic paradigm of large language models (LLMs) utilization have significantly enhanced Text-to-SQL capabilities, enabling users without specialized database expertise to query data intuitively. However, deploying these agentic LLM-based Text-to-SQL systems in production poses substantial challenges due to their inherently multi-stage workflows, stringent latency… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

  32. arXiv:2505.00725  [pdf, other

    cs.CL cs.IR cs.LG

    FinBERT-QA: Financial Question Answering with pre-trained BERT Language Models

    Authors: Bithiah Yuan

    Abstract: Motivated by the emerging demand in the financial industry for the automatic analysis of unstructured and structured data at scale, Question Answering (QA) systems can provide lucrative and competitive advantages to companies by facilitating the decision making of financial advisers. Consequently, we propose a novel financial QA system using the transformer-based pre-trained BERT language model to… ▽ More

    Submitted 24 April, 2025; originally announced May 2025.

    Comments: Submitted in partial fulfillment of the requirements for the Master of Science degree in Computer Science at the University of Freiburg, July 31, 2020

    ACM Class: I.2.7; I.5.1; H.3.3

  33. arXiv:2504.02901  [pdf, other

    cs.LG cs.AI

    Hide and Seek in Noise Labels: Noise-Robust Collaborative Active Learning with LLM-Powered Assistance

    Authors: Bo Yuan, Yulin Chen, Yin Zhang, Wei Jiang

    Abstract: Learning from noisy labels (LNL) is a challenge that arises in many real-world scenarios where collected training data can contain incorrect or corrupted labels. Most existing solutions identify noisy labels and adopt active learning to query human experts on them for denoising. In the era of large language models (LLMs), although we can reduce the human effort to improve these methods, their perf… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

  34. arXiv:2504.01506  [pdf, other

    cs.LG

    MLKV: Efficiently Scaling up Large Embedding Model Training with Disk-based Key-Value Storage

    Authors: Yongjun He, Roger Waleffe, Zhichao Han, Johnu George, Binhang Yuan, Zitao Zhang, Yinan Shan, Yang Zhao, Debojyoti Dutta, Theodoros Rekatsinas, Ce Zhang

    Abstract: Many modern machine learning (ML) methods rely on embedding models to learn vector representations (embeddings) for a set of entities (embedding tables). As increasingly diverse ML applications utilize embedding models and embedding tables continue to grow in size and number, there has been a surge in the ad-hoc development of specialized frameworks targeted to train large embedding models for spe… ▽ More

    Submitted 2 April, 2025; originally announced April 2025.

    Comments: To appear in ICDE 2025

  35. arXiv:2503.18278  [pdf, other

    cs.CV cs.AI

    TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model

    Authors: Cheng Yang, Yang Sui, Jinqi Xiao, Lingyi Huang, Yu Gong, Chendi Li, Jinghua Yan, Yu Bai, Ponnuswamy Sadayappan, Xia Hu, Bo Yuan

    Abstract: Vision-Language Models (VLMs) demand substantial computational resources during inference, largely due to the extensive visual input tokens for representing visual information. Previous studies have noted that visual tokens tend to receive less attention than text tokens, suggesting their lower importance during inference and potential for pruning. However, their methods encounter several challeng… ▽ More

    Submitted 29 March, 2025; v1 submitted 23 March, 2025; originally announced March 2025.

    Comments: Accepted by CVPR 2025

  36. arXiv:2503.13935  [pdf, other

    cs.CV

    SCORE: Soft Label Compression-Centric Dataset Condensation via Coding Rate Optimization

    Authors: Bowen Yuan, Yuxia Fu, Zijian Wang, Yadan Luo, Zi Huang

    Abstract: Dataset Condensation (DC) aims to obtain a condensed dataset that allows models trained on the condensed dataset to achieve performance comparable to those trained on the full dataset. Recent DC approaches increasingly focus on encoding knowledge into realistic images with soft labeling, for their scalability to ImageNet-scale datasets and strong capability of cross-domain generalization. However,… ▽ More

    Submitted 18 March, 2025; originally announced March 2025.

  37. arXiv:2503.09022  [pdf, other

    cs.CR

    Prompt Inversion Attack against Collaborative Inference of Large Language Models

    Authors: Wenjie Qu, Yuguang Zhou, Yongji Wu, Tingsong Xiao, Binhang Yuan, Yiming Li, Jiaheng Zhang

    Abstract: Large language models (LLMs) have been widely applied for their remarkable capability of content generation. However, the practical use of open-source LLMs is hindered by high resource requirements, making deployment expensive and limiting widespread development. The collaborative inference is a promising solution for this problem, in which users collaborate by each hosting a subset of layers and… ▽ More

    Submitted 2 May, 2025; v1 submitted 11 March, 2025; originally announced March 2025.

    Comments: To appear at IEEE Symposium on Security and Privacy 2025

  38. arXiv:2503.04869  [pdf, other

    cs.CL cs.AI

    Label Distribution Learning-Enhanced Dual-KNN for Text Classification

    Authors: Bo Yuan, Yulin Chen, Zhen Tan, Wang Jinyan, Huan Liu, Yin Zhang

    Abstract: Many text classification methods usually introduce external information (e.g., label descriptions and knowledge bases) to improve the classification performance. Compared to external information, some internal information generated by the model itself during training, like text embeddings and predicted label probability distributions, are exploited poorly when predicting the outcomes of some texts… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: Accepted by SDM 2024

  39. arXiv:2502.19543  [pdf

    cs.LG math.NA physics.comp-ph

    High-fidelity Multiphysics Modelling for Rapid Predictions Using Physics-informed Parallel Neural Operator

    Authors: Biao Yuan, He Wang, Yanjie Song, Ana Heitor, Xiaohui Chen

    Abstract: Modelling complex multiphysics systems governed by nonlinear and strongly coupled partial differential equations (PDEs) is a cornerstone in computational science and engineering. However, it remains a formidable challenge for traditional numerical solvers due to high computational cost, making them impractical for large-scale applications. Neural operators' reliance on data-driven training limits… ▽ More

    Submitted 26 February, 2025; originally announced February 2025.

    Comments: 10 pages, 11 figures, 1 table, 36 equations

  40. arXiv:2502.15386  [pdf, other

    cs.ET quant-ph

    EDA-Q: Electronic Design Automation for Superconducting Quantum Chip

    Authors: Bo Zhao, Zhihang Li, Xiaohan Yu, Benzheng Yuan, Chaojie Zhang, Yimin Gao, Weilong Wang, Qing Mu, Shuya Wang, Huihui Sun, Tian Yang, Mengfan Zhang, Chuanbing Han, Peng Xu, Wenqing Wang, Zheng Shan

    Abstract: Electronic Design Automation (EDA) plays a crucial role in classical chip design and significantly influences the development of quantum chip design. However, traditional EDA tools cannot be directly applied to quantum chip design due to vast differences compared to the classical realm. Several EDA products tailored for quantum chip design currently exist, yet they only cover partial stages of the… ▽ More

    Submitted 17 April, 2025; v1 submitted 21 February, 2025; originally announced February 2025.

    Comments: 12pages, 11 figures, 4 tables

  41. arXiv:2502.12897  [pdf, ps, other

    cs.IT math.CO

    On Zero Skip-Cost Generalized Fractional-Repetition Codes from Covering Designs

    Authors: Wenjun Yu, Bo-Jun Yuan, Moshe Schwartz

    Abstract: We study generalized fractional repetition codes that have zero skip cost, and which are based on covering designs. We show that a zero skip cost is always attainable, perhaps at a price of an expansion factor compared with the optimal size of fractional repetition codes based on Steiner systems. We provide three constructions, as well as show non-constructively, that no expansion is needed for al… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  42. arXiv:2502.12574  [pdf, other

    cs.LG cs.AI

    HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

    Authors: Cheng Luo, Zefan Cai, Hanshi Sun, Jinqi Xiao, Bo Yuan, Wen Xiao, Junjie Hu, Jiawei Zhao, Beidi Chen, Anima Anandkumar

    Abstract: Transformer-based large language models (LLMs) demonstrate impressive performance in long context generation. Extending the context length has disproportionately shifted the memory footprint of LLMs during inference to the key-value cache (KV cache). In this paper, we propose HEADINFER, which offloads the KV cache to CPU RAM while avoiding the need to fully store the KV cache for any transformer l… ▽ More

    Submitted 18 February, 2025; originally announced February 2025.

  43. arXiv:2502.07903  [pdf, other

    cs.DC

    HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment

    Authors: Youhe Jiang, Ran Yan, Binhang Yuan

    Abstract: Disaggregating the prefill and decoding phases represents an effective new paradigm for generative inference of large language models (LLM), which eliminates prefill-decoding interference and optimizes resource allocation. However, it is still an open problem about how to deploy the disaggregated inference paradigm across a group of heterogeneous GPUs, which can be an economical alternative to dep… ▽ More

    Submitted 11 February, 2025; originally announced February 2025.

    Comments: ICLR 2025

  44. arXiv:2502.01378  [pdf, other

    cs.LG

    CE-LoRA: Computation-Efficient LoRA Fine-Tuning for Language Models

    Authors: Guanduo Chen, Yutong He, Yipeng Hu, Kun Yuan, Binhang Yuan

    Abstract: Large Language Models (LLMs) demonstrate exceptional performance across various tasks but demand substantial computational resources even for fine-tuning computation. Although Low-Rank Adaptation (LoRA) significantly alleviates memory consumption during fine-tuning, its impact on computational cost reduction is limited. This paper identifies the computation of activation gradients as the primary b… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  45. arXiv:2502.01159  [pdf, ps, other

    cs.LG cs.AI

    AtmosSci-Bench: Evaluating the Recent Advance of Large Language Model for Atmospheric Science

    Authors: Chenyue Li, Wen Deng, Mengqian Lu, Binhang Yuan

    Abstract: The rapid advancements in large language models (LLMs), particularly in their reasoning capabilities, hold transformative potential for addressing complex challenges in atmospheric science. However, leveraging LLMs effectively in this domain requires a robust and comprehensive evaluation benchmark. Toward this end, we present AtmosSci-Bench, a novel benchmark designed to systematically assess LLM… ▽ More

    Submitted 31 May, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

    Comments: 33 pages, 4 figures, 7 tables

  46. arXiv:2502.00722  [pdf, ps, other

    cs.DC

    Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs

    Authors: Youhe Jiang, Fangcheng Fu, Xiaozhe Yao, Guoliang He, Xupeng Miao, Ana Klimovic, Bin Cui, Binhang Yuan, Eiko Yoneki

    Abstract: Recent advancements in Large Language Models (LLMs) have led to increasingly diverse requests, accompanied with varying resource (compute and memory) demands to serve them. However, this in turn degrades the cost-efficiency of LLM serving as common practices primarily rely on homogeneous GPU resources. In response to this problem, this work conducts a thorough study about serving LLMs over heterog… ▽ More

    Submitted 5 June, 2025; v1 submitted 2 February, 2025; originally announced February 2025.

  47. arXiv:2501.18794  [pdf

    q-bio.GN cs.AI

    Survey and Improvement Strategies for Gene Prioritization with Large Language Models

    Authors: Matthew Neeley, Guantong Qi, Guanchu Wang, Ruixiang Tang, Dongxue Mao, Chaozhong Liu, Sasidhar Pasupuleti, Bo Yuan, Fan Xia, Pengfei Liu, Zhandong Liu, Xia Hu

    Abstract: Rare diseases are challenging to diagnose due to limited patient data and genetic diversity. Despite advances in variant prioritization, many cases remain undiagnosed. While large language models (LLMs) have performed well in medical exams, their effectiveness in diagnosing rare genetic diseases has not been assessed. To identify causal genes, we benchmarked various LLMs for gene prioritization. U… ▽ More

    Submitted 30 January, 2025; originally announced January 2025.

    Comments: 11 pages, 4 figures, 10 pages of supplementary figures

  48. arXiv:2501.14224  [pdf, other

    cs.AI cs.DB cs.LG

    Top Ten Challenges Towards Agentic Neural Graph Databases

    Authors: Jiaxin Bai, Zihao Wang, Yukun Zhou, Hang Yin, Weizhi Fei, Qi Hu, Zheye Deng, Jiayang Cheng, Tianshi Zheng, Hong Ting Tsang, Yisen Gao, Zhongwei Xie, Yufei Li, Lixin Fan, Binhang Yuan, Wei Wang, Lei Chen, Xiaofang Zhou, Yangqiu Song

    Abstract: Graph databases (GDBs) like Neo4j and TigerGraph excel at handling interconnected data but lack advanced inference capabilities. Neural Graph Databases (NGDBs) address this by integrating Graph Neural Networks (GNNs) for predictive analysis and reasoning over incomplete or noisy data. However, NGDBs rely on predefined queries and lack autonomy and adaptability. This paper introduces Agentic Neural… ▽ More

    Submitted 23 January, 2025; originally announced January 2025.

    Comments: 12 Pages

  49. arXiv:2412.15277  [pdf, other

    cs.CL cs.AI

    PLPP: Prompt Learning with Perplexity Is Self-Distillation for Vision-Language Models

    Authors: Biao Liu, Wenyi Fang, Xiaoyu Wu, Yang Zheng, Zheng Hu, Bo Yuan

    Abstract: Pre-trained Vision-Language (VL) models such as CLIP have demonstrated their excellent performance across numerous downstream tasks. A recent method, Context Optimization (CoOp), further improves the performance of VL models on downstream tasks by introducing prompt learning. CoOp optimizes a set of learnable vectors, aka prompt, and freezes the whole CLIP model. However, relying solely on CLIP lo… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  50. arXiv:2412.02603  [pdf

    cs.HC

    Generative AI as a Tool for Enhancing Reflective Learning in Students

    Authors: Bo Yuan, Jiazi Hu

    Abstract: Reflection is widely recognized as a cornerstone of student development, fostering critical thinking, self-regulation, and deep conceptual understanding. Traditionally, reflective skills have been cultivated through structured feedback, mentorship, and guided self-assessment. However, these approaches often face challenges such as limited scalability, difficulties in delivering individualized feed… ▽ More

    Submitted 9 September, 2025; v1 submitted 19 November, 2024; originally announced December 2024.

    Comments: Accepted by IEEE TALE 2025