Skip to main content

Showing 1–13 of 13 results for author: Qiao, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.13996  [pdf, ps, other

    cs.LG

    Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences

    Authors: Stas Bekman, Samyam Rajbhandari, Michael Wyatt, Jeff Rasley, Tunji Ruwase, Zhewei Yao, Aurick Qiao, Yuxiong He

    Abstract: Long sequences are critical for applications like RAG, long document summarization, multi-modality, etc., and modern LLMs, like Llama 4 Scout, support max sequence length of up to 10 million tokens. However, outside of enterprise labs, long sequence training is challenging for the AI community with limited system support in the open-source space. Out-of-box, even on a modern NVIDIA H100 80GB GPU… ▽ More

    Submitted 16 June, 2025; originally announced June 2025.

    Comments: 19 pages, 13 figures

  2. arXiv:2505.05874  [pdf, ps, other

    cs.LG physics.chem-ph q-bio.BM

    A 3D pocket-aware and evolutionary conserved interaction guided diffusion model for molecular optimization

    Authors: Anjie Qiao, Hao Zhang, Qianmu Yuan, Qirui Deng, Jingtian Su, Weifeng Huang, Huihao Zhou, Guo-Bo Li, Zhen Wang, Jinping Lei

    Abstract: Generating molecules that bind to specific protein targets via diffusion models has shown good promise for structure-based drug design and molecule optimization. Especially, the diffusion models with binding interaction guidance enables molecule generation with high affinity through forming favorable interaction within protein pocket. However, the generated molecules may not form interactions with… ▽ More

    Submitted 9 May, 2025; originally announced May 2025.

  3. arXiv:2504.21065  [pdf, other

    cs.LG cs.AI

    A 3D pocket-aware and affinity-guided diffusion model for lead optimization

    Authors: Anjie Qiao, Junjie Xie, Weifeng Huang, Hao Zhang, Jiahua Rao, Shuangjia Zheng, Yuedong Yang, Zhen Wang, Guo-Bo Li, Jinping Lei

    Abstract: Molecular optimization, aimed at improving binding affinity or other molecular properties, is a crucial task in drug discovery that often relies on the expertise of medicinal chemists. Recently, deep learning-based 3D generative models showed promise in enhancing the efficiency of molecular optimization. However, these models often struggle to adequately consider binding affinities with protein ta… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

  4. arXiv:2412.20993  [pdf, other

    cs.LG cs.CL

    Efficiently Scaling LLM Reasoning with Certaindex

    Authors: Yichao Fu, Junda Chen, Siqi Zhu, Zheyu Fu, Zhongdongming Dai, Yonghao Zhuang, Yian Ma, Aurick Qiao, Tajana Rosing, Ion Stoica, Hao Zhang

    Abstract: Test-time reasoning algorithms such as chain-of-thought, self-consistency, and MCTS enhance LLM problem-solving but can wastefully generate many tokens without improving accuracy. At the same time, we observe that these algorithms exhibit answer stabilization: their intermediate solutions often cease to change after a certain point, and further investment of compute does not change their final ans… ▽ More

    Submitted 27 May, 2025; v1 submitted 30 December, 2024; originally announced December 2024.

  5. arXiv:2411.04975  [pdf, ps, other

    cs.CL cs.AI cs.DC cs.LG

    SuffixDecoding: Extreme Speculative Decoding for Emerging AI Applications

    Authors: Gabriele Oliaro, Zhihao Jia, Daniel Campos, Aurick Qiao

    Abstract: Speculative decoding is widely adopted to reduce latency in large language model (LLM) inference by leveraging smaller draft models capable of handling diverse user tasks. However, emerging AI applications, such as LLM-based agents, present unique workload characteristics: instead of diverse independent requests, agentic frameworks typically submit repetitive inference requests, such as multi-agen… ▽ More

    Submitted 2 June, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

  6. arXiv:2410.03960  [pdf, ps, other

    cs.LG cs.AI cs.CL

    SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation

    Authors: Aurick Qiao, Zhewei Yao, Samyam Rajbhandari, Yuxiong He

    Abstract: LLM inference for enterprise applications, such as summarization, RAG, and code-generation, typically observe much longer prompt than generations, leading to high prefill cost and response latency. We present SwiftKV, a novel model transformation and distillation procedure targeted at reducing the prefill compute (in FLOPs) of prompt tokens while preserving high generation quality. First, SwiftKV… ▽ More

    Submitted 1 June, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

  7. arXiv:2409.06211  [pdf, other

    cs.LG cs.CL

    STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning

    Authors: Jaeseong Lee, seung-won hwang, Aurick Qiao, Daniel F Campos, Zhewei Yao, Yuxiong He

    Abstract: Mixture-of-experts (MoEs) have been adopted for reducing inference costs by sparsely activating experts in Large language models (LLMs). Despite this reduction, the massive number of experts in MoEs still makes them expensive to serve. In this paper, we study how to address this, by pruning MoEs. Among pruning methodologies, unstructured pruning has been known to achieve the highest performance fo… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  8. arXiv:2408.15792  [pdf, other

    cs.LG

    Efficient LLM Scheduling by Learning to Rank

    Authors: Yichao Fu, Siqi Zhu, Runlong Su, Aurick Qiao, Ion Stoica, Hao Zhang

    Abstract: In Large Language Model (LLM) inference, the output length of an LLM request is typically regarded as not known a priori. Consequently, most LLM serving systems employ a simple First-come-first-serve (FCFS) scheduling strategy, leading to Head-Of-Line (HOL) blocking and reduced throughput and service quality. In this paper, we reexamine this assumption -- we show that, although predicting the exac… ▽ More

    Submitted 28 August, 2024; originally announced August 2024.

  9. Nurgle: Exacerbating Resource Consumption in Blockchain State Storage via MPT Manipulation

    Authors: Zheyuan He, Zihao Li, Ao Qiao, Xiapu Luo, Xiaosong Zhang, Ting Chen, Shuwei Song, Dijun Liu, Weina Niu

    Abstract: Blockchains, with intricate architectures, encompass various components, e.g., consensus network, smart contracts, decentralized applications, and auxiliary services. While offering numerous advantages, these components expose various attack surfaces, leading to severe threats to blockchains. In this study, we unveil a novel attack surface, i.e., the state storage, in blockchains. The state storag… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  10. arXiv:2403.14280  [pdf, other

    cs.CR

    Large Language Models for Blockchain Security: A Systematic Literature Review

    Authors: Zheyuan He, Zihao Li, Sen Yang, He Ye, Ao Qiao, Xiaosong Zhang, Xiapu Luo, Ting Chen

    Abstract: Large Language Models (LLMs) have emerged as powerful tools across various domains within cyber security. Notably, recent studies are increasingly exploring LLMs applied to the context of blockchain security (BS). However, there remains a gap in a comprehensive understanding regarding the full scope of applications, impacts, and potential constraints of LLMs on blockchain security. To fill this ga… ▽ More

    Submitted 24 March, 2025; v1 submitted 21 March, 2024; originally announced March 2024.

  11. arXiv:2312.06550  [pdf, other

    cs.CL cs.AI cs.LG

    LLM360: Towards Fully Transparent Open-Source LLMs

    Authors: Zhengzhong Liu, Aurick Qiao, Willie Neiswanger, Hongyi Wang, Bowen Tan, Tianhua Tao, Junbo Li, Yuqi Wang, Suqi Sun, Omkar Pangarkar, Richard Fan, Yi Gu, Victor Miller, Yonghao Zhuang, Guowei He, Haonan Li, Fajri Koto, Liping Tang, Nikhil Ranjan, Zhiqiang Shen, Xuguang Ren, Roberto Iriondo, Cun Mu, Zhiting Hu, Mark Schulze , et al. (3 additional authors not shown)

    Abstract: The recent surge in open-source Large Language Models (LLMs), such as LLaMA, Falcon, and Mistral, provides diverse options for AI practitioners and researchers. However, most LLMs have only released partial artifacts, such as the final model weights or inference code, and technical reports increasingly limit their scope to high-level design choices and surface statistics. These choices hinder prog… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  12. arXiv:2008.12260  [pdf, other

    cs.DC cs.LG

    Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning

    Authors: Aurick Qiao, Sang Keun Choe, Suhas Jayaram Subramanya, Willie Neiswanger, Qirong Ho, Hao Zhang, Gregory R. Ganger, Eric P. Xing

    Abstract: Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors both at the per-job level and at the cluster-wide level. Most existing schedulers expect users to specify the number of resources for each job, often leading to inefficient resource use. Some recent schedulers choose job resources for users, but do so without awareness of how D… ▽ More

    Submitted 26 May, 2021; v1 submitted 27 August, 2020; originally announced August 2020.

  13. arXiv:1810.07354  [pdf, other

    cs.LG stat.ML

    Fault Tolerance in Iterative-Convergent Machine Learning

    Authors: Aurick Qiao, Bryon Aragam, Bingjing Zhang, Eric P. Xing

    Abstract: Machine learning (ML) training algorithms often possess an inherent self-correcting behavior due to their iterative-convergent nature. Recent systems exploit this property to achieve adaptability and efficiency in unreliable computing environments by relaxing the consistency of execution and allowing calculation errors to be self-corrected during training. However, the behavior of such systems are… ▽ More

    Submitted 16 October, 2018; originally announced October 2018.