Skip to main content

Showing 1–50 of 172 results for author: Xing, E P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.14965  [pdf, ps, other

    cs.LG cs.AI cs.CL

    Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

    Authors: Zhoujun Cheng, Shibo Hao, Tianyang Liu, Fan Zhou, Yutao Xie, Feng Yao, Yuexin Bian, Yonghao Zhuang, Nilabjo Dey, Yuheng Zha, Yi Gu, Kun Zhou, Yuqi Wang, Yuan Li, Richard Fan, Jianshu She, Chengqian Gao, Abulhair Saparov, Haonan Li, Taylor W. Killian, Mikhail Yurochkin, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

    Abstract: Reinforcement learning (RL) has emerged as a promising approach to improve large language model (LLM) reasoning, yet most open efforts focus narrowly on math and code, limiting our understanding of its broader applicability to general reasoning. A key challenge lies in the lack of reliable, scalable RL reward signals across diverse reasoning domains. We introduce Guru, a curated RL reasoning corpu… ▽ More

    Submitted 17 June, 2025; originally announced June 2025.

    Comments: 38 pages, 9 figures. Under review

  2. arXiv:2506.04761  [pdf, ps, other

    cs.LG

    Log-Linear Attention

    Authors: Han Guo, Songlin Yang, Tarushii Goel, Eric P. Xing, Tri Dao, Yoon Kim

    Abstract: The attention mechanism in Transformers is an important primitive for accurate and scalable sequence modeling. Its quadratic-compute and linear-memory complexity however remain significant bottlenecks. Linear attention and state-space models enable linear-time, constant-memory sequence modeling and can moreover be trained efficiently through matmul-rich parallelization across sequence length. Howe… ▽ More

    Submitted 25 June, 2025; v1 submitted 5 June, 2025; originally announced June 2025.

  3. arXiv:2505.15146  [pdf, ps, other

    cs.AI

    lmgame-Bench: How Good are LLMs at Playing Games?

    Authors: Lanxiang Hu, Mingjia Huo, Yuxuan Zhang, Haoyang Yu, Eric P. Xing, Ion Stoica, Tajana Rosing, Haojian Jin, Hao Zhang

    Abstract: Playing video games requires perception, memory, and planning, exactly the faculties modern large language model (LLM) agents are expected to master. We study the major challenges in using popular video games to evaluate modern LLMs and find that directly dropping LLMs into games cannot make an effective evaluation, for three reasons -- brittle vision perception, prompt sensitivity, and potential… ▽ More

    Submitted 3 June, 2025; v1 submitted 21 May, 2025; originally announced May 2025.

  4. arXiv:2505.12808  [pdf, ps, other

    cs.CL cs.LG

    Decentralized Arena: Towards Democratic and Scalable Automatic Evaluation of Language Models

    Authors: Yanbin Yin, Kun Zhou, Zhen Wang, Xiangdong Zhang, Yifei Shao, Shibo Hao, Yi Gu, Jieyuan Liu, Somanshu Singla, Tianyang Liu, Eric P. Xing, Zhengzhong Liu, Haojian Jin, Zhiting Hu

    Abstract: The recent explosion of large language models (LLMs), each with its own general or specialized strengths, makes scalable, reliable benchmarking more urgent than ever. Standard practices nowadays face fundamental trade-offs: closed-ended question-based benchmarks (eg MMLU) struggle with saturation as newer models emerge, while crowd-sourced leaderboards (eg Chatbot Arena) rely on costly and slow hu… ▽ More

    Submitted 19 May, 2025; originally announced May 2025.

    Comments: 20 pages, ongoing work

  5. arXiv:2504.02807  [pdf, other

    cs.CL cs.AI cs.LG

    MegaMath: Pushing the Limits of Open Math Corpora

    Authors: Fan Zhou, Zengzhi Wang, Nikhil Ranjan, Zhoujun Cheng, Liping Tang, Guowei He, Zhengzhong Liu, Eric P. Xing

    Abstract: Mathematical reasoning is a cornerstone of human intelligence and a key benchmark for advanced capabilities in large language models (LLMs). However, the research community still lacks an open, large-scale, high-quality corpus tailored to the demands of math-centric LLM pre-training. We present MegaMath, an open dataset curated from diverse, math-focused sources through following practices: (1) Re… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: 26 pages, 15 figures, 22 tables

  6. arXiv:2502.16840  [pdf, other

    cs.LG cs.AI

    In-context learning of evolving data streams with tabular foundational models

    Authors: Afonso Lourenço, João Gama, Eric P. Xing, Goreti Marreiros

    Abstract: State-of-the-art data stream mining in supervised classification has traditionally relied on ensembles of incremental decision trees. However, the emergence of large tabular models, i.e., transformers designed for structured numerical data, marks a significant paradigm shift. These models move beyond traditional weight updates, instead employing in-context learning through prompt tuning. By using… ▽ More

    Submitted 23 February, 2025; originally announced February 2025.

  7. arXiv:2502.01976  [pdf, other

    cs.CL cs.AI cs.LG cs.PF

    CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing

    Authors: Wenhao Zheng, Yixiao Chen, Weitong Zhang, Souvik Kundu, Yun Li, Zhengzhong Liu, Eric P. Xing, Hongyi Wang, Huaxiu Yao

    Abstract: Large language models have achieved remarkable success in various tasks but suffer from high computational costs during inference, limiting their deployment in resource-constrained applications. To address this issue, we propose a novel Collaborative Inference with Token-lEvel Routing (CITER) framework that enables efficient collaboration between small and large language models (SLMs \& LLMs) thro… ▽ More

    Submitted 1 May, 2025; v1 submitted 3 February, 2025; originally announced February 2025.

  8. arXiv:2501.09163  [pdf, other

    cs.LG cs.AI stat.ML

    Towards Understanding Extrapolation: a Causal Lens

    Authors: Lingjing Kong, Guangyi Chen, Petar Stojanov, Haoxuan Li, Eric P. Xing, Kun Zhang

    Abstract: Canonical work handling distribution shifts typically necessitates an entire target distribution that lands inside the training distribution. However, practical scenarios often involve only a handful of target samples, potentially lying outside the training support, which requires the capability of extrapolation. In this work, we aim to provide a theoretical understanding of when extrapolation is… ▽ More

    Submitted 15 January, 2025; originally announced January 2025.

    Comments: NeurIPS 2024

  9. arXiv:2411.15418  [pdf, other

    q-bio.BM cs.LG

    Scaling Structure Aware Virtual Screening to Billions of Molecules with SPRINT

    Authors: Andrew T. McNutt, Abhinav K. Adduri, Caleb N. Ellington, Monica T. Dayao, Eric P. Xing, Hosein Mohimani, David R. Koes

    Abstract: Virtual screening of small molecules against protein targets can accelerate drug discovery and development by predicting drug-target interactions (DTIs). However, structure-based methods like molecular docking are too slow to allow for broad proteome-scale screens, limiting their application in screening for off-target effects or new molecular mechanisms. Recently, vector-based methods using prote… ▽ More

    Submitted 20 January, 2025; v1 submitted 22 November, 2024; originally announced November 2024.

  10. arXiv:2411.10645  [pdf, other

    cs.LG stat.ML

    Patient-Specific Models of Treatment Effects Explain Heterogeneity in Tuberculosis

    Authors: Ethan Wu, Caleb Ellington, Ben Lengerich, Eric P. Xing

    Abstract: Tuberculosis (TB) is a major global health challenge, and is compounded by co-morbidities such as HIV, diabetes, and anemia, which complicate treatment outcomes and contribute to heterogeneous patient responses. Traditional models of TB often overlook this heterogeneity by focusing on broad, pre-defined patient groups, thereby missing the nuanced effects of individual patient contexts. We propose… ▽ More

    Submitted 15 November, 2024; originally announced November 2024.

    Comments: Findings paper presented at Machine Learning for Health (ML4H) symposium 2024, December 15-16, 2024, Vancouver, Canada, 4 pages

  11. arXiv:2411.08733  [pdf, other

    cs.CL

    Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models

    Authors: Somanshu Singla, Zhen Wang, Tianyang Liu, Abdullah Ashfaq, Zhiting Hu, Eric P. Xing

    Abstract: Aligning Large Language Models (LLMs) traditionally relies on costly training and human preference annotations. Self-alignment seeks to reduce these expenses by enabling models to align themselves. To further lower costs and achieve alignment without any expensive tuning or annotations, we introduce a new tuning-free approach for self-alignment, Dynamic Rewarding with Prompt Optimization (DRPO). O… ▽ More

    Submitted 13 November, 2024; v1 submitted 13 November, 2024; originally announced November 2024.

    Comments: EMNLP 2024 Main

  12. arXiv:2411.06518  [pdf, other

    cs.LG q-bio.QM stat.ME

    Causal Representation Learning from Multimodal Biomedical Observations

    Authors: Yuewen Sun, Lingjing Kong, Guangyi Chen, Loka Li, Gongxu Luo, Zijian Li, Yixuan Zhang, Yujia Zheng, Mengyue Yang, Petar Stojanov, Eran Segal, Eric P. Xing, Kun Zhang

    Abstract: Prevalent in biomedical applications (e.g., human phenotype research), multimodal datasets can provide valuable insights into the underlying physiological mechanisms. However, current machine learning (ML) models designed to analyze these datasets often lack interpretability and identifiability guarantees, which are essential for biomedical research. Recent advances in causal representation learni… ▽ More

    Submitted 16 March, 2025; v1 submitted 10 November, 2024; originally announced November 2024.

  13. arXiv:2411.04156  [pdf, other

    cs.SE cs.AI cs.CL

    Crystal: Illuminating LLM Abilities on Language and Code

    Authors: Tianhua Tao, Junbo Li, Bowen Tan, Hongyi Wang, William Marshall, Bhargav M Kanakiya, Joel Hestness, Natalia Vassilieva, Zhiqiang Shen, Eric P. Xing, Zhengzhong Liu

    Abstract: Large Language Models (LLMs) specializing in code generation (which are also often referred to as code LLMs), e.g., StarCoder and Code Llama, play increasingly critical roles in various software development scenarios. It is also crucial for code LLMs to possess both code generation and natural language abilities for many specific applications, such as code snippet retrieval using natural language… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

    Comments: Published as a conference paper at COLM 2024

  14. arXiv:2408.10189  [pdf, other

    cs.LG cs.AI

    Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models

    Authors: Aviv Bick, Kevin Y. Li, Eric P. Xing, J. Zico Kolter, Albert Gu

    Abstract: Transformer architectures have become a dominant paradigm for domains like language modeling but suffer in many inference settings due to their quadratic-time self-attention. Recently proposed subquadratic architectures, such as Mamba, have shown promise, but have been pretrained with substantially less computational resources than the strongest Transformer models. In this work, we present a metho… ▽ More

    Submitted 8 February, 2025; v1 submitted 19 August, 2024; originally announced August 2024.

  15. arXiv:2407.10960  [pdf, other

    cs.LG cs.CL cs.DC

    Fast Matrix Multiplications for Lookup Table-Quantized LLMs

    Authors: Han Guo, William Brandon, Radostin Cholakov, Jonathan Ragan-Kelley, Eric P. Xing, Yoon Kim

    Abstract: The deployment of large language models (LLMs) is often constrained by memory bandwidth, where the primary bottleneck is the cost of transferring model parameters from the GPU's global memory to its registers. When coupled with custom kernels that fuse the dequantization and matmul operations, weight-only quantization can thus enable faster inference by reducing the amount of memory movement. Howe… ▽ More

    Submitted 16 January, 2025; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: EMNLP 2024 (Findings)

  16. arXiv:2406.20098  [pdf, other

    cs.CV cs.AI cs.CL

    Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

    Authors: Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, Jinhong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen

    Abstract: Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose $\texttt{Web2Code}$, a benchmark consisting of a new large-scal… ▽ More

    Submitted 17 November, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2024 Datasets and Benchmarks Camera-ready Version. Website at https://mbzuai-llm.github.io/webpage2code/

  17. arXiv:2406.09455  [pdf, other

    cs.CV cs.AI cs.CL

    Pandora: Towards General World Model with Natural Language Actions and Video States

    Authors: Jiannan Xiang, Guangyi Liu, Yi Gu, Qiyue Gao, Yuting Ning, Yuheng Zha, Zeyu Feng, Tianhua Tao, Shibo Hao, Yemin Shi, Zhengzhong Liu, Eric P. Xing, Zhiting Hu

    Abstract: World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Website: https://world-model.maitrix.org/

  18. arXiv:2406.00519  [pdf, other

    cs.LG cs.AI stat.ML

    Learning Discrete Concepts in Latent Hierarchical Models

    Authors: Lingjing Kong, Guangyi Chen, Biwei Huang, Eric P. Xing, Yuejie Chi, Kun Zhang

    Abstract: Learning concepts from natural high-dimensional data (e.g., images) holds potential in building human-aligned and interpretable machine learning models. Despite its encouraging prospect, formalization and theoretical insights into this crucial task are still lacking. In this work, we formalize concepts as discrete latent causal variables that are related via a hierarchical causal model that encode… ▽ More

    Submitted 14 January, 2025; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2024

  19. arXiv:2404.02852  [pdf, other

    cs.LG

    Toward Inference-optimal Mixture-of-Expert Large Language Models

    Authors: Longfei Yun, Yonghao Zhuang, Yao Fu, Eric P Xing, Hao Zhang

    Abstract: Mixture-of-Expert (MoE) based large language models (LLMs), such as the recent Mixtral and DeepSeek-MoE, have shown great promise in scaling model size without suffering from the quadratic growth of training cost of dense transformers. Like dense models, training MoEs requires answering the same question: given a training budget, what is the optimal allocation on the model size and number of token… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 15 pages, 8 figures

  20. arXiv:2402.19009  [pdf, other

    cs.LG cs.AI

    Unified Generation, Reconstruction, and Representation: Generalized Diffusion with Adaptive Latent Encoding-Decoding

    Authors: Guangyi Liu, Yu Wang, Zeyu Feng, Qiyu Wu, Liping Tang, Yuan Gao, Zhen Li, Shuguang Cui, Julian McAuley, Zichao Yang, Eric P. Xing, Zhiting Hu

    Abstract: The vast applications of deep generative models are anchored in three core capabilities -- generating new instances, reconstructing inputs, and learning compact representations -- across various data types, such as discrete text/protein sequences and continuous images. Existing model families, like variational autoencoders (VAEs), generative adversarial networks (GANs), autoregressive models, and… ▽ More

    Submitted 5 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: ICML 2024 camera-ready. Code is available at https://github.com/guangyliu/EDDPM

  21. arXiv:2402.16840  [pdf, other

    cs.CL

    MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT

    Authors: Omkar Thawakar, Ashmal Vayani, Salman Khan, Hisham Cholakal, Rao M. Anwer, Michael Felsberg, Tim Baldwin, Eric P. Xing, Fahad Shahbaz Khan

    Abstract: "Bigger the better" has been the predominant trend in recent Large Language Models (LLMs) development. However, LLMs do not suit well for scenarios that require on-device processing, energy efficiency, low memory footprint, and response efficiency. These requisites are crucial for privacy, security, and sustainable deployment. This paper explores the "less is more" paradigm by addressing the chall… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

    Comments: Code available at : https://github.com/mbzuai-oryx/MobiLlama

  22. arXiv:2312.06550  [pdf, other

    cs.CL cs.AI cs.LG

    LLM360: Towards Fully Transparent Open-Source LLMs

    Authors: Zhengzhong Liu, Aurick Qiao, Willie Neiswanger, Hongyi Wang, Bowen Tan, Tianhua Tao, Junbo Li, Yuqi Wang, Suqi Sun, Omkar Pangarkar, Richard Fan, Yi Gu, Victor Miller, Yonghao Zhuang, Guowei He, Haonan Li, Fajri Koto, Liping Tang, Nikhil Ranjan, Zhiqiang Shen, Xuguang Ren, Roberto Iriondo, Cun Mu, Zhiting Hu, Mark Schulze , et al. (3 additional authors not shown)

    Abstract: The recent surge in open-source Large Language Models (LLMs), such as LLaMA, Falcon, and Mistral, provides diverse options for AI practitioners and researchers. However, most LLMs have only released partial artifacts, such as the final model weights or inference code, and technical reports increasingly limit their scope to high-level design choices and surface statistics. These choices hinder prog… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  23. arXiv:2311.12023  [pdf, other

    cs.CL cs.LG

    LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning

    Authors: Han Guo, Philip Greengard, Eric P. Xing, Yoon Kim

    Abstract: We propose a simple approach for memory-efficient adaptation of pretrained language models. Our approach uses an iterative algorithm to decompose each pretrained matrix into a high-precision low-rank component and a memory-efficient quantized component. During finetuning, the quantized component remains fixed and only the low-rank component is updated. We present an integer linear programming form… ▽ More

    Submitted 26 August, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

  24. arXiv:2310.16427  [pdf, other

    cs.CL

    PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization

    Authors: Xinyuan Wang, Chenxi Li, Zhen Wang, Fan Bai, Haotian Luo, Jiayou Zhang, Nebojsa Jojic, Eric P. Xing, Zhiting Hu

    Abstract: Highly effective, task-specific prompts are often heavily engineered by experts to integrate detailed instructions and domain insights based on a deep understanding of both instincts of large language models (LLMs) and the intricacies of the target task. However, automating the generation of such expert-level prompts remains elusive. Existing prompt optimization methods tend to overlook the depth… ▽ More

    Submitted 7 December, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

    Comments: 34 pages, 10 figures

  25. arXiv:2310.11340  [pdf, other

    stat.ML cs.LG

    Contextualized Machine Learning

    Authors: Benjamin Lengerich, Caleb N. Ellington, Andrea Rubbi, Manolis Kellis, Eric P. Xing

    Abstract: We examine Contextualized Machine Learning (ML), a paradigm for learning heterogeneous and context-dependent effects. Contextualized ML estimates heterogeneous functions by applying deep learning to the meta-relationship between contextual information and context-specific parametric models. This is a form of varying-coefficient modeling that unifies existing frameworks including cluster analysis a… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  26. arXiv:2310.07918  [pdf, other

    cs.LG cs.AI stat.ML

    Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning

    Authors: Jannik Deuschel, Caleb N. Ellington, Yingtao Luo, Benjamin J. Lengerich, Pascal Friederich, Eric P. Xing

    Abstract: Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models force a tradeoff between accuracy and interpretability, limiting data-driven interpretations of human decision-making processes. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, w… ▽ More

    Submitted 7 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

  27. arXiv:2310.03294  [pdf, other

    cs.LG cs.AI cs.DC

    DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training

    Authors: Dacheng Li, Rulin Shao, Anze Xie, Eric P. Xing, Xuezhe Ma, Ion Stoica, Joseph E. Gonzalez, Hao Zhang

    Abstract: FlashAttention (Dao, 2023) effectively reduces the quadratic peak memory usage to linear in training transformer-based large language models (LLMs) on a single GPU. In this paper, we introduce DISTFLASHATTN, a distributed memory-efficient attention mechanism optimized for long-context LLMs training. We propose three key techniques: token-level workload balancing, overlapping key-value communicatio… ▽ More

    Submitted 31 March, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

  28. arXiv:2310.03163  [pdf, other

    cs.LG

    FedNAR: Federated Optimization with Normalized Annealing Regularization

    Authors: Junbo Li, Ang Li, Chong Tian, Qirong Ho, Eric P. Xing, Hongyi Wang

    Abstract: Weight decay is a standard technique to improve generalization performance in modern deep neural network optimization, and is also widely adopted in federated learning (FL) to prevent overfitting in local clients. In this paper, we first explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. While preventing overfi… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

    Comments: Thirty-seventh Conference on Neural Information Processing Systems

    Journal ref: Thirty-seventh Conference on Neural Information Processing Systems, 2023

  29. arXiv:2309.11998  [pdf, other

    cs.CL cs.AI

    LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset

    Authors: Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Tianle Li, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zhuohan Li, Zi Lin, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

    Abstract: Studying how people interact with large language models (LLMs) in real-world scenarios is increasingly important due to their widespread use in various applications. In this paper, we introduce LMSYS-Chat-1M, a large-scale dataset containing one million real-world conversations with 25 state-of-the-art LLMs. This dataset is collected from 210K unique IP addresses in the wild on our Vicuna demo and… ▽ More

    Submitted 10 March, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

  30. arXiv:2306.05685  [pdf, other

    cs.CL cs.AI

    Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

    Authors: Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, Ion Stoica

    Abstract: Evaluating large language model (LLM) based chat assistants is challenging due to their broad capabilities and the inadequacy of existing benchmarks in measuring human preferences. To address this, we explore using strong LLMs as judges to evaluate these models on more open-ended questions. We examine the usage and limitations of LLM-as-a-judge, including position, verbosity, and self-enhancement… ▽ More

    Submitted 23 December, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 Datasets and Benchmarks Track

  31. arXiv:2306.04898  [pdf, other

    cs.LG cs.CV

    Understanding Masked Autoencoders via Hierarchical Latent Variable Models

    Authors: Lingjing Kong, Martin Q. Ma, Guangyi Chen, Eric P. Xing, Yuejie Chi, Louis-Philippe Morency, Kun Zhang

    Abstract: Masked autoencoder (MAE), a simple and effective self-supervised learning framework based on the reconstruction of masked image regions, has recently achieved prominent success in a variety of vision tasks. Despite the emergence of intriguing empirical observations on MAE, a theoretically principled understanding is still lacking. In this work, we formally characterize and justify existing empiric… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: CVPR 2023 Highlight

  32. arXiv:2305.02538  [pdf, other

    cs.LG

    Cuttlefish: Low-Rank Model Training without All the Tuning

    Authors: Hongyi Wang, Saurabh Agarwal, Pongsakorn U-chupala, Yoshiki Tanaka, Eric P. Xing, Dimitris Papailiopoulos

    Abstract: Recent research has shown that training low-rank neural networks can effectively reduce the total number of trainable parameters without sacrificing predictive accuracy, resulting in end-to-end speedups. However, low-rank model training necessitates adjusting several additional factorization hyperparameters, such as the rank of the factorization at each layer. In this paper, we tackle this challen… ▽ More

    Submitted 5 May, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: Accepted for presentation at MLSys 2023

  33. arXiv:2302.04228  [pdf, other

    cs.LG

    Federated Learning as Variational Inference: A Scalable Expectation Propagation Approach

    Authors: Han Guo, Philip Greengard, Hongyi Wang, Andrew Gelman, Yoon Kim, Eric P. Xing

    Abstract: The canonical formulation of federated learning treats it as a distributed optimization problem where the model parameters are optimized against a global loss function that decomposes across client loss functions. A recent alternative formulation instead treats federated learning as a distributed inference problem, where the goal is to infer a global posterior from partitioned client data (Al-Shed… ▽ More

    Submitted 8 February, 2023; originally announced February 2023.

  34. arXiv:2301.02654  [pdf, other

    cs.LG

    Does compressing activations help model parallel training?

    Authors: Song Bian, Dacheng Li, Hongyi Wang, Eric P. Xing, Shivaram Venkataraman

    Abstract: Large-scale Transformer models are known for their exceptional performance in a range of tasks, but training them can be difficult due to the requirement for communication-intensive model parallelism. One way to improve training speed is to compress the message size in communication. Previous approaches have primarily focused on compressing gradients in a data parallelism setting, but compression… ▽ More

    Submitted 6 January, 2023; originally announced January 2023.

    Comments: 16 pages, 5 figures

  35. arXiv:2212.04875  [pdf, other

    cs.CV cs.AI

    Expeditious Saliency-guided Mix-up through Random Gradient Thresholding

    Authors: Minh-Long Luu, Zeyi Huang, Eric P. Xing, Yong Jae Lee, Haohan Wang

    Abstract: Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks. Over the years, the research community expands mix-up methods into two directions, with extensive efforts to improve saliency-guided procedures but minimal focus on the arbitrary path, leaving the randomization domain unexplored. In this paper, inspired by the superior qualities… ▽ More

    Submitted 10 August, 2023; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: Accepted Long paper at 2nd Practical-DL Workshop at AAAI 2023

  36. arXiv:2211.05322  [pdf, other

    cs.LG cs.DC

    On Optimizing the Communication of Model Parallelism

    Authors: Yonghao Zhuang, Hexu Zhao, Lianmin Zheng, Zhuohan Li, Eric P. Xing, Qirong Ho, Joseph E. Gonzalez, Ion Stoica, Hao Zhang

    Abstract: We study a novel and important communication pattern in large-scale model-parallel deep learning (DL), which we call cross-mesh resharding. This pattern emerges when the two paradigms of model parallelism - intra-operator and inter-operator parallelism - are combined to support large models on large clusters. In cross-mesh resharding, a sharded tensor needs to be sent from a source device mesh to… ▽ More

    Submitted 18 August, 2024; v1 submitted 9 November, 2022; originally announced November 2022.

  37. arXiv:2211.01452  [pdf, other

    cs.LG cs.CR

    MPCFormer: fast, performant and private Transformer inference with MPC

    Authors: Dacheng Li, Rulin Shao, Hongyi Wang, Han Guo, Eric P. Xing, Hao Zhang

    Abstract: Enabling private inference is crucial for many cloud inference services that are based on Transformer models. However, existing private inference solutions can increase the inference latency by more than 60x or significantly compromise the inference quality. In this paper, we design the framework MPCFORMER as a practical solution, using Secure Multi-Party Computation (MPC) and Knowledge Distillati… ▽ More

    Submitted 16 March, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

  38. arXiv:2210.04325  [pdf, other

    cs.CL cs.AI cs.LG

    ASDOT: Any-Shot Data-to-Text Generation with Pretrained Language Models

    Authors: Jiannan Xiang, Zhengzhong Liu, Yucheng Zhou, Eric P. Xing, Zhiting Hu

    Abstract: Data-to-text generation is challenging due to the great variety of the input data in terms of domains (e.g., finance vs sports) or schemata (e.g., diverse predicates). Recent end-to-end neural methods thus require substantial training examples to learn to disambiguate and describe the data. Yet, real-world data-to-text problems often suffer from various data-scarce issues: one may have access to o… ▽ More

    Submitted 22 October, 2022; v1 submitted 9 October, 2022; originally announced October 2022.

    Comments: Findings of EMNLP 2022

  39. arXiv:2208.00219  [pdf, other

    cs.CV cs.AI cs.LG cs.MM

    Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation

    Authors: Gongjie Zhang, Zhipeng Luo, Kaiwen Cui, Shijian Lu, Eric P. Xing

    Abstract: Few-shot object detection has been extensively investigated by incorporating meta-learning into region-based detection frameworks. Despite its success, the said paradigm is still constrained by several factors, such as (i) low-quality region proposals for novel classes and (ii) negligence of the inter-class correlation among different classes. Such limitations hinder the generalization of base-cla… ▽ More

    Submitted 30 July, 2022; originally announced August 2022.

    Comments: Accepted by T-PAMI (IEEE Transactions on Pattern Analysis and Machine Intelligence). Codes: https://github.com/ZhangGongjie/Meta-DETR

  40. arXiv:2207.14172  [pdf, other

    cs.CV

    Semantic-Aligned Matching for Enhanced DETR Convergence and Multi-Scale Feature Fusion

    Authors: Gongjie Zhang, Zhipeng Luo, Jiaxing Huang, Shijian Lu, Eric P. Xing

    Abstract: The recently proposed DEtection TRansformer (DETR) has established a fully end-to-end paradigm for object detection. However, DETR suffers from slow training convergence, which hinders its applicability to various detection tasks. We observe that DETR's slow convergence is largely attributed to the difficulty in matching object queries to relevant regions due to the unaligned semantics between obj… ▽ More

    Submitted 6 February, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

  41. arXiv:2207.08944  [pdf, other

    cs.CV cs.LG

    Robustar: Interactive Toolbox Supporting Precise Data Annotation for Robust Vision Learning

    Authors: Chonghan Chen, Haohan Wang, Leyang Hu, Yuhao Zhang, Shuguang Lyu, Jingcheng Wu, Xinnuo Li, Linjing Sun, Eric P. Xing

    Abstract: We introduce the initial release of our software Robustar, which aims to improve the robustness of vision classification machine learning models through a data-driven perspective. Building upon the recent understanding that the lack of machine learning model's robustness is the tendency of the model's learning of spurious features, we aim to solve this problem from its root at the data perspective… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: This paper introduces the first release of our software. The paper is expected to be updated as we continue to develop the software

  42. arXiv:2207.08943  [pdf, ps, other

    cs.CL cs.LG

    MRCLens: an MRC Dataset Bias Detection Toolkit

    Authors: Yifan Zhong, Haohan Wang, Eric P. Xing

    Abstract: Many recent neural models have shown remarkable empirical results in Machine Reading Comprehension, but evidence suggests sometimes the models take advantage of dataset biases to predict and fail to generalize on out-of-sample data. While many other approaches have been proposed to address this issue from the computation perspective such as new architectures or training procedures, we believe a me… ▽ More

    Submitted 18 July, 2022; originally announced July 2022.

    Comments: dataperf workshop at IMCL

  43. arXiv:2206.14268  [pdf, other

    cs.CL

    BertNet: Harvesting Knowledge Graphs with Arbitrary Relations from Pretrained Language Models

    Authors: Shibo Hao, Bowen Tan, Kaiwen Tang, Bin Ni, Xiyan Shao, Hengzhe Zhang, Eric P. Xing, Zhiting Hu

    Abstract: It is crucial to automatically construct knowledge graphs (KGs) of diverse new relations to support knowledge discovery and broad applications. Previous KG construction methods, based on either crowdsourcing or text mining, are often limited to a small predefined set of relations due to manual cost or restrictions in text corpus. Recent research proposed to use pretrained language models (LMs) as… ▽ More

    Submitted 2 June, 2023; v1 submitted 28 June, 2022; originally announced June 2022.

    Comments: ACL 2023 (Findings); Code available at https://github.com/tanyuqian/knowledge-harvest-from-lms

  44. arXiv:2206.01909  [pdf, ps, other

    cs.LG

    Toward Learning Robust and Invariant Representations with Alignment Regularization and Data Augmentation

    Authors: Haohan Wang, Zeyi Huang, Xindi Wu, Eric P. Xing

    Abstract: Data augmentation has been proven to be an effective technique for developing machine learning models that are robust to known classes of distributional shifts (e.g., rotations of images), and alignment regularization is a technique often used together with data augmentation to further help the model learn representations invariant to the shifts used to augment the data. In this paper, motivated b… ▽ More

    Submitted 4 June, 2022; originally announced June 2022.

    Comments: to appear at KDD 2022, the software package is at https://github.com/jyanln/AlignReg. arXiv admin note: text overlap with arXiv:2011.13052

  45. arXiv:2205.12548  [pdf, other

    cs.CL cs.LG

    RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning

    Authors: Mingkai Deng, Jianyu Wang, Cheng-Ping Hsieh, Yihan Wang, Han Guo, Tianmin Shu, Meng Song, Eric P. Xing, Zhiting Hu

    Abstract: Prompting has shown impressive success in enabling large pretrained language models (LMs) to perform diverse NLP tasks, especially when only few downstream data are available. Automatically finding the optimal prompt for each task, however, is challenging. Most existing work resorts to tuning soft prompt (e.g., embeddings) which falls short of interpretability, reusability across LMs, and applicab… ▽ More

    Submitted 22 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: EMNLP 2022 Camera Ready. Code available at https://github.com/mingkaid/rl-prompt

  46. arXiv:2204.04384  [pdf, other

    cs.LG cs.CV

    The Two Dimensions of Worst-case Training and the Integrated Effect for Out-of-domain Generalization

    Authors: Zeyi Huang, Haohan Wang, Dong Huang, Yong Jae Lee, Eric P. Xing

    Abstract: Training with an emphasis on "hard-to-learn" components of the data has been proven as an effective method to improve the generalization of machine learning models, especially in the settings where robustness (e.g., generalization across distributions) is valued. Existing literature discussing this "hard-to-learn" concept are mainly expanded either along the dimension of the samples or the dimensi… ▽ More

    Submitted 9 April, 2022; originally announced April 2022.

    Comments: to appear at CVPR2022

  47. arXiv:2202.01336  [pdf, other

    cs.LG

    Exploring Transformer Backbones for Heterogeneous Treatment Effect Estimation

    Authors: Yi-Fan Zhang, Hanlin Zhang, Zachary C. Lipton, Li Erran Li, Eric P. Xing

    Abstract: Previous works on Treatment Effect Estimation (TEE) are not in widespread use because they are predominantly theoretical, where strong parametric assumptions are made but untractable for practical application. Recent work uses multilayer perceptron (MLP) for modeling casual relationships, however, MLPs lag far behind recent advances in ML methodology, which limits their applicability and generaliz… ▽ More

    Submitted 17 October, 2022; v1 submitted 2 February, 2022; originally announced February 2022.

  48. arXiv:2201.12023  [pdf, other

    cs.LG cs.DC cs.PL

    Alpa: Automating Inter- and Intra-Operator Parallelism for Distributed Deep Learning

    Authors: Lianmin Zheng, Zhuohan Li, Hao Zhang, Yonghao Zhuang, Zhifeng Chen, Yanping Huang, Yida Wang, Yuanzhong Xu, Danyang Zhuo, Eric P. Xing, Joseph E. Gonzalez, Ion Stoica

    Abstract: Alpa automates model-parallel training of large deep learning (DL) models by generating execution plans that unify data, operator, and pipeline parallelism. Existing model-parallel training systems either require users to manually create a parallelization plan or automatically generate one from a limited space of model parallelism configurations. They do not suffice to scale out complex DL models… ▽ More

    Submitted 28 June, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

    Comments: OSDI 2022

  49. arXiv:2111.13839  [pdf, other

    cs.LG cs.CV

    Towards Principled Disentanglement for Domain Generalization

    Authors: Hanlin Zhang, Yi-Fan Zhang, Weiyang Liu, Adrian Weller, Bernhard Schölkopf, Eric P. Xing

    Abstract: A fundamental challenge for machine learning models is generalizing to out-of-distribution (OOD) data, in part due to spurious correlations. To tackle this challenge, we first formalize the OOD generalization problem as constrained optimization, called Disentanglement-constrained Domain Generalization (DDG). We relax this non-trivial constrained optimization problem to a tractable form with finite… ▽ More

    Submitted 19 October, 2022; v1 submitted 27 November, 2021; originally announced November 2021.

    Comments: CVPR 2022 Oral

  50. arXiv:2111.01104  [pdf, other

    stat.ML cs.AI cs.LG

    NOTMAD: Estimating Bayesian Networks with Sample-Specific Structures and Parameters

    Authors: Ben Lengerich, Caleb Ellington, Bryon Aragam, Eric P. Xing, Manolis Kellis

    Abstract: Context-specific Bayesian networks (i.e. directed acyclic graphs, DAGs) identify context-dependent relationships between variables, but the non-convexity induced by the acyclicity requirement makes it difficult to share information between context-specific estimators (e.g. with graph generator functions). For this reason, existing methods for inferring context-specific Bayesian networks have favor… ▽ More

    Submitted 1 November, 2021; originally announced November 2021.