Skip to main content

Showing 1–31 of 31 results for author: Kozyrakis, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.19925  [pdf, other

    cs.DC cs.LG

    Accelerating Mixture-of-Experts Training with Adaptive Expert Replication

    Authors: Athinagoras Skiadopoulos, Mark Zhao, Swapnil Gandhi, Thomas Norrie, Shrijeet Mukherjee, Christos Kozyrakis

    Abstract: Mixture-of-Experts (MoE) models have become a widely adopted solution to continue scaling model sizes without a corresponding linear increase in compute. During MoE model training, each input token is dynamically routed to a subset of experts -- sparsely-activated feed-forward networks -- within each transformer layer. The distribution of tokens assigned to each expert varies widely and rapidly ov… ▽ More

    Submitted 28 April, 2025; originally announced April 2025.

    Comments: Preprint. Under review

  2. arXiv:2504.18082  [pdf, other

    cs.LG cs.AI

    Efficient GNN Training Through Structure-Aware Randomized Mini-Batching

    Authors: Vignesh Balaji, Christos Kozyrakis, Gal Chechik, Haggai Maron

    Abstract: Graph Neural Networks (GNNs) enable learning on realworld graphs and mini-batch training has emerged as the de facto standard for training GNNs because it can scale to very large graphs and improve convergence. Current mini-batch construction policies largely ignore efficiency considerations of GNN training. Specifically, existing mini-batching techniques employ randomization schemes to improve ac… ▽ More

    Submitted 25 April, 2025; originally announced April 2025.

  3. arXiv:2412.15411  [pdf, other

    cs.DC

    MoEtion: Efficient and Reliable Sparse Checkpointing for Mixture-of-Experts Models at Scale

    Authors: Swapnil Gandhi, Christos Kozyrakis

    Abstract: As large language models continue to scale, training them requires thousands of GPUs over prolonged durations--making frequent failures an inevitable reality. While checkpointing remains the primary fault-tolerance mechanism, existing methods struggle to efficiently support Mixture-of-Experts (MoE) models. Due to the substantially larger training state of MoE models, traditional checkpointing tech… ▽ More

    Submitted 29 April, 2025; v1 submitted 19 December, 2024; originally announced December 2024.

  4. arXiv:2411.03519  [pdf, other

    cs.DC cs.AI cs.LG cs.MA

    AI Metropolis: Scaling Large Language Model-based Multi-Agent Simulation with Out-of-order Execution

    Authors: Zhiqiang Xie, Hao Kang, Ying Sheng, Tushar Krishna, Kayvon Fatahalian, Christos Kozyrakis

    Abstract: With more advanced natural language understanding and reasoning capabilities, large language model (LLM)-powered agents are increasingly developed in simulated environments to perform complex tasks, interact with other agents, and exhibit emergent behaviors relevant to social science and gaming. However, current multi-agent simulations frequently suffer from inefficiencies due to the limited paral… ▽ More

    Submitted 5 November, 2024; originally announced November 2024.

  5. arXiv:2410.01032  [pdf, other

    cs.CY

    Teaching Cloud Infrastructure and Scalable Application Deployment in an Undergraduate Computer Science Program

    Authors: Aditya Saligrama, Cody Ho, Benjamin Tripp, Michael Abbott, Christos Kozyrakis

    Abstract: Making successful use of cloud computing requires nuanced approaches to both system design and deployment methodology, involving reasoning about the elasticity, cost, and security models of cloud services. Building cloud-native applications without a firm understanding of the fundamentals of cloud engineering can leave students susceptible to cost and security pitfalls. Yet, cloud computing is not… ▽ More

    Submitted 2 December, 2024; v1 submitted 1 October, 2024; originally announced October 2024.

    Comments: To appear in SIGCSE TS 2025

  6. arXiv:2408.17351  [pdf, other

    cs.OS

    Tide: A Split OS Architecture for Control Plane Offloading

    Authors: Jack Tigar Humphries, Neel Natu, Kostis Kaffes, Stanko Novaković, Paul Turner, Hank Levy, David Culler, Christos Kozyrakis

    Abstract: The end of Moore's Law is driving cloud providers to offload virtualization and the network data plane to SmartNICs to improve compute efficiency. Even though individual OS control plane tasks consume up to 5% of cycles across the fleet, they remain on the host CPU because they are tightly intertwined with OS mechanisms. Moreover, offloading puts the slow PCIe interconnect in the critical path of… ▽ More

    Submitted 20 October, 2024; v1 submitted 30 August, 2024; originally announced August 2024.

    Comments: About 11 pages

  7. arXiv:2407.08694  [pdf, other

    cs.DC cs.AI cs.LG

    Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight

    Authors: Zhiqiang Xie, Yujia Zheng, Lizi Ottens, Kun Zhang, Christos Kozyrakis, Jonathan Mace

    Abstract: Runtime failure and performance degradation is commonplace in modern cloud systems. For cloud providers, automatically determining the root cause of incidents is paramount to ensuring high reliability and availability as prompt fault localization can enable faster diagnosis and triage for timely resolution. A compelling solution explored in recent work is causal reasoning using causal graphs to ca… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  8. ReCycle: Resilient Training of Large DNNs using Pipeline Adaptation

    Authors: Swapnil Gandhi, Mark Zhao, Athinagoras Skiadopoulos, Christos Kozyrakis

    Abstract: Training large Deep Neural Network (DNN) models requires thousands of GPUs over the course of several days or weeks. At this scale, failures are frequent and can have a big impact on training throughput. Utilizing spare GPU servers to mitigate performance loss becomes increasingly costly as model sizes grow. ReCycle is a system designed for efficient DNN training in the presence of failures, witho… ▽ More

    Submitted 25 September, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: SOSP'24 | Camera-Ready

  9. arXiv:2401.08895  [pdf, other

    cs.LG cs.DC cs.PF

    cedar: Optimized and Unified Machine Learning Input Data Pipelines

    Authors: Mark Zhao, Emanuel Adamiak, Christos Kozyrakis

    Abstract: The input data pipeline is an essential component of each machine learning (ML) training job. It is responsible for reading massive amounts of training data, processing batches of samples using complex transformations, and loading them onto training nodes at low latency and high throughput. Performant input data systems are becoming increasingly critical, driven by skyrocketing data volumes and tr… ▽ More

    Submitted 27 November, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: Published in PVLDB Volume 18, Issue 2

  10. arXiv:2312.07104  [pdf, other

    cs.AI cs.PL

    SGLang: Efficient Execution of Structured Language Model Programs

    Authors: Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng

    Abstract: Large language models (LLMs) are increasingly used for complex tasks that require multiple generation calls, advanced prompting techniques, control flow, and structured inputs/outputs. However, efficient systems are lacking for programming and executing these applications. We introduce SGLang, a system for efficient execution of complex language model programs. SGLang consists of a frontend langua… ▽ More

    Submitted 5 June, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

  11. arXiv:2305.03785  [pdf, other

    cs.DB

    Zelda: Video Analytics using Vision-Language Models

    Authors: Francisco Romero, Caleb Winston, Johann Hauswald, Matei Zaharia, Christos Kozyrakis

    Abstract: Advances in ML have motivated the design of video analytics systems that allow for structured queries over video datasets. However, existing systems limit query expressivity, require users to specify an ML model per predicate, rely on complex optimizations that trade off accuracy for performance, and return large amounts of redundant and low-quality results. This paper focuses on the recently deve… ▽ More

    Submitted 7 November, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

  12. arXiv:2301.02959  [pdf, other

    cs.LG cs.DC cs.IR cs.PF

    FlexShard: Flexible Sharding for Industry-Scale Sequence Recommendation Models

    Authors: Geet Sethi, Pallab Bhattacharya, Dhruv Choudhary, Carole-Jean Wu, Christos Kozyrakis

    Abstract: Sequence-based deep learning recommendation models (DLRMs) are an emerging class of DLRMs showing great improvements over their prior sum-pooling based counterparts at capturing users' long term interests. These improvements come at immense system cost however, with sequence-based DLRMs requiring substantial amounts of data to be dynamically materialized and communicated by each accelerator during… ▽ More

    Submitted 7 January, 2023; originally announced January 2023.

  13. arXiv:2212.14161  [pdf, other

    cs.DB cs.DC cs.SE

    Transactions Make Debugging Easy

    Authors: Qian Li, Peter Kraft, Michael Cafarella, Çağatay Demiralp, Goetz Graefe, Christos Kozyrakis, Michael Stonebraker, Lalith Suresh, Matei Zaharia

    Abstract: We propose TROD, a novel transaction-oriented framework for debugging modern distributed web applications and online services. Our critical insight is that if applications store all state in databases and only access state transactionally, TROD can use lightweight always-on tracing to track the history of application state changes and data provenance, and then leverage the captured traces and tran… ▽ More

    Submitted 28 December, 2022; originally announced December 2022.

    Comments: CIDR'23

  14. arXiv:2211.05239  [pdf, other

    cs.LG cs.DC cs.IR cs.PF

    RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training Infrastructure

    Authors: Mark Zhao, Dhruv Choudhary, Devashish Tyagi, Ajay Somani, Max Kaplan, Sung-Han Lin, Sarunya Pumma, Jongsoo Park, Aarti Basant, Niket Agarwal, Carole-Jean Wu, Christos Kozyrakis

    Abstract: We present RecD (Recommendation Deduplication), a suite of end-to-end infrastructure optimizations across the Deep Learning Recommendation Model (DLRM) training pipeline. RecD addresses immense storage, preprocessing, and training overheads caused by feature duplication inherent in industry-scale DLRM training datasets. Feature duplication arises because DLRM datasets are generated from interactio… ▽ More

    Submitted 1 May, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: Published in the Proceedings of the Sixth Conference on Machine Learning and Systems (MLSys 2023)

  15. arXiv:2208.13068  [pdf, other

    cs.DB cs.DC

    Apiary: A DBMS-Integrated Transactional Function-as-a-Service Framework

    Authors: Peter Kraft, Qian Li, Kostis Kaffes, Athinagoras Skiadopoulos, Deeptaanshu Kumar, Danny Cho, Jason Li, Robert Redmond, Nathan Weckwerth, Brian Xia, Peter Bailis, Michael Cafarella, Goetz Graefe, Jeremy Kepner, Christos Kozyrakis, Michael Stonebraker, Lalith Suresh, Xiangyao Yu, Matei Zaharia

    Abstract: Developers increasingly use function-as-a-service (FaaS) platforms for data-centric applications that perform low-latency and transactional operations on data, such as for microservices or web serving. Unfortunately, existing FaaS platforms support these applications poorly because they physically and logically separate application logic, executed in cloud functions, from data management, done in… ▽ More

    Submitted 30 June, 2023; v1 submitted 27 August, 2022; originally announced August 2022.

    Comments: 14 pages, 13 figures, 3 tables. Preprint

  16. arXiv:2201.10477  [pdf, other

    cs.OS

    SOL: Safe On-Node Learning in Cloud Platforms

    Authors: Yawen Wang, Daniel Crankshaw, Neeraja J. Yadwadkar, Daniel Berger, Christos Kozyrakis, Ricardo Bianchini

    Abstract: Cloud platforms run many software agents on each server node. These agents manage all aspects of node operation, and in some cases frequently collect data and make decisions. Unfortunately, their behavior is typically based on pre-defined static heuristics or offline analysis; they do not leverage on-node machine learning (ML). In this paper, we first characterize the spectrum of node agents in Az… ▽ More

    Submitted 25 January, 2022; originally announced January 2022.

  17. arXiv:2201.10095  [pdf, other

    cs.LG cs.AR cs.DC cs.PF

    RecShard: Statistical Feature-Based Memory Optimization for Industry-Scale Neural Recommendation

    Authors: Geet Sethi, Bilge Acun, Niket Agarwal, Christos Kozyrakis, Caroline Trippel, Carole-Jean Wu

    Abstract: We propose RecShard, a fine-grained embedding table (EMB) partitioning and placement technique for deep learning recommendation models (DLRMs). RecShard is designed based on two key observations. First, not all EMBs are equal, nor all rows within an EMB are equal in terms of access patterns. EMBs exhibit distinct memory characteristics, providing performance optimization opportunities for intellig… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

  18. arXiv:2111.07226  [pdf, other

    cs.DC

    Practical Scheduling for Real-World Serverless Computing

    Authors: Kostis Kaffes, Neeraja J. Yadwadkar, Christos Kozyrakis

    Abstract: Serverless computing has seen rapid growth due to the ease-of-use and cost-efficiency it provides. However, function scheduling, a critical component of serverless systems, has been overlooked. In this paper, we take a first-principles approach toward designing a scheduler that caters to the unique characteristics of serverless functions as seen in real-world deployments. We first create a taxonom… ▽ More

    Submitted 13 November, 2021; originally announced November 2021.

  19. Understanding Data Storage and Ingestion for Large-Scale Deep Recommendation Model Training

    Authors: Mark Zhao, Niket Agarwal, Aarti Basant, Bugra Gedik, Satadru Pan, Mustafa Ozdal, Rakesh Komuravelli, Jerry Pan, Tianshu Bao, Haowei Lu, Sundaram Narayanan, Jack Langman, Kevin Wilfong, Harsha Rastogi, Carole-Jean Wu, Christos Kozyrakis, Parik Pol

    Abstract: Datacenter-scale AI training clusters consisting of thousands of domain-specific accelerators (DSA) are used to train increasingly-complex deep learning models. These clusters rely on a data storage and ingestion (DSI) pipeline, responsible for storing exabytes of training data and serving it at tens of terabytes per second. As DSAs continue to push training efficiency and throughput, the DSI pipe… ▽ More

    Submitted 22 April, 2022; v1 submitted 20 August, 2021; originally announced August 2021.

    Comments: In The 49th Annual International Symposium on Computer Architecture (ISCA 2022)

  20. arXiv:2104.13869  [pdf, other

    cs.DC

    Faa$T: A Transparent Auto-Scaling Cache for Serverless Applications

    Authors: Francisco Romero, Gohar Irfan Chaudhry, Íñigo Goiri, Pragna Gopa, Paul Batum, Neeraja J. Yadwadkar, Rodrigo Fonseca, Christos Kozyrakis, Ricardo Bianchini

    Abstract: Function-as-a-Service (FaaS) has become an increasingly popular way for users to deploy their applications without the burden of managing the underlying infrastructure. However, existing FaaS platforms rely on remote storage to maintain state, limiting the set of applications that can be run efficiently. Recent caching work for FaaS platforms has tried to address this problem, but has fallen short… ▽ More

    Submitted 28 April, 2021; originally announced April 2021.

    Comments: 18 pages, 15 figures

  21. ShEF: Shielded Enclaves for Cloud FPGAs

    Authors: Mark Zhao, Mingyu Gao, Christos Kozyrakis

    Abstract: FPGAs are now used in public clouds to accelerate a wide range of applications, including many that operate on sensitive data such as financial and medical records. We present ShEF, a trusted execution environment (TEE) for cloud-based reconfigurable accelerators. ShEF is independent from CPU-based TEEs and allows secure execution under a threat model where the adversary can control all software r… ▽ More

    Submitted 27 January, 2022; v1 submitted 5 March, 2021; originally announced March 2021.

  22. arXiv:2102.01887  [pdf, other

    cs.DC

    Llama: A Heterogeneous & Serverless Framework for Auto-Tuning Video Analytics Pipelines

    Authors: Francisco Romero, Mark Zhao, Neeraja J. Yadwadkar, Christos Kozyrakis

    Abstract: The proliferation of camera-enabled devices and large video repositories has led to a diverse set of video analytics applications. These applications rely on video pipelines, represented as DAGs of operations, to transform videos, process extracted metadata, and answer questions like, "Is this intersection congested?" The latency and resource efficiency of pipelines can be optimized using configur… ▽ More

    Submitted 28 May, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

  23. arXiv:2010.05969  [pdf, other

    cs.DC

    RackSched: A Microsecond-Scale Scheduler for Rack-Scale Computers (Technical Report)

    Authors: Hang Zhu, Kostis Kaffes, Zixu Chen, Zhenming Liu, Christos Kozyrakis, Ion Stoica, Xin Jin

    Abstract: Low-latency online services have strict Service Level Objectives (SLOs) that require datacenter systems to support high throughput at microsecond-scale tail latency. Dataplane operating systems have been designed to scale up multi-core servers with minimal overhead for such SLOs. However, as application demands continue to increase, scaling up is not enough, and serving larger demands requires the… ▽ More

    Submitted 15 October, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

  24. arXiv:2007.11112  [pdf, other

    cs.OS cs.AR cs.DB cs.DC cs.NI

    DBOS: A Proposal for a Data-Centric Operating System

    Authors: Michael Cafarella, David DeWitt, Vijay Gadepally, Jeremy Kepner, Christos Kozyrakis, Tim Kraska, Michael Stonebraker, Matei Zaharia

    Abstract: Current operating systems are complex systems that were designed before today's computing environments. This makes it difficult for them to meet the scalability, heterogeneity, availability, and security challenges in current cloud and parallel computing environments. To address these problems, we propose a radically new OS design based on data-centric architecture: all operating system state shou… ▽ More

    Submitted 21 July, 2020; originally announced July 2020.

  25. arXiv:1905.13348  [pdf, other

    cs.DC cs.LG

    INFaaS: A Model-less and Managed Inference Serving System

    Authors: Francisco Romero, Qian Li, Neeraja J. Yadwadkar, Christos Kozyrakis

    Abstract: Despite existing work in machine learning inference serving, ease-of-use and cost efficiency remain challenges at large scales. Developers must manually search through thousands of model-variants -- versions of already-trained models that differ in hardware, resource footprints, latencies, costs, and accuracies -- to meet the diverse application requirements. Since requirements, query load, and ap… ▽ More

    Submitted 15 December, 2020; v1 submitted 30 May, 2019; originally announced May 2019.

    Report number: https://www.usenix.org/system/files/atc21-romero.pdf

  26. arXiv:1903.07754  [pdf, other

    cs.DC

    A New Frontier for Pull-Based Graph Processing

    Authors: Samuel Grossman, Christos Kozyrakis

    Abstract: The trade-off between pull-based and push-based graph processing engines is well-understood. On one hand, pull-based engines can achieve higher throughput because their workloads are read-dominant, rather than write-dominant, and can proceed without synchronization between threads. On the other hand, push-based engines are much better able to take advantage of the frontier optimization, which leve… ▽ More

    Submitted 18 March, 2019; originally announced March 2019.

  27. arXiv:1812.09442  [pdf, other

    cs.DC

    Trevor: Automatic configuration and scaling of stream processing pipelines

    Authors: Manu Bansal, Eyal Cidon, Arjun Balasingam, Aditya Gudipati, Christos Kozyrakis, Sachin Katti

    Abstract: Operating a distributed data stream processing workload efficiently at scale is hard. The operator of the workload must parallelize and lay out tasks of the workload with resources that match the requirement of target data rate. The challenge is that neither the operator nor the programmer is typically aware of the scaling behavior of the workload as a function of resources. An operator manually s… ▽ More

    Submitted 21 December, 2018; originally announced December 2018.

  28. Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

    Authors: Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Ou Setter, Jing Pu, Ankita Nayak, Steven Emberton Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, Mark Horowitz

    Abstract: We show that DNN accelerator micro-architectures and their program mappings represent specific choices of loop order and hardware parallelism for computing the seven nested loops of DNNs, which enables us to create a formal taxonomy of all existing dense DNN accelerators. Surprisingly, the loop transformations needed to create these hardware variants can be precisely and concisely represented by H… ▽ More

    Submitted 26 April, 2020; v1 submitted 10 September, 2018; originally announced September 2018.

    Comments: Published as a conference paper at ASPLOS 2020

    ACM Class: C.1.4; C.3; C.4

    Journal ref: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, March, 2020, Pages 369-383

  29. arXiv:1803.02329  [pdf, other

    cs.LG stat.ML

    Learning Memory Access Patterns

    Authors: Milad Hashemi, Kevin Swersky, Jamie A. Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, Parthasarathy Ranganathan

    Abstract: The explosion in workload complexity and the recent slow-down in Moore's law scaling call for new approaches towards efficient computing. Researchers are now beginning to use recent advances in machine learning in software optimizations, augmenting or replacing traditional heuristics and data structures. However, the space of machine learning for computer hardware architecture is only lightly expl… ▽ More

    Submitted 6 March, 2018; originally announced March 2018.

  30. arXiv:1711.02294  [pdf, other

    cs.NI cs.OS

    AppSwitch: Resolving the Application Identity Crisis

    Authors: Dinesh Subhraveti, Sri Goli, Serge Hallyn, Ravi Chamarthy, Christos Kozyrakis

    Abstract: Networked applications traditionally derive their identity from the identity of the host on which they run. The default application identity acquired from the host results in subtle and substantial problems related to application deployment, discovery and access, especially for modern distributed applications. A number of mechanisms and workarounds, often quite elaborate, are used to address those… ▽ More

    Submitted 8 November, 2017; v1 submitted 7 November, 2017; originally announced November 2017.

  31. arXiv:1511.06968  [pdf, other

    cs.DC cs.PL

    Generating Configurable Hardware from Parallel Patterns

    Authors: Raghu Prabhakar, David Koeplinger, Kevin Brown, HyoukJoong Lee, Christopher De Sa, Christos Kozyrakis, Kunle Olukotun

    Abstract: In recent years the computing landscape has seen an in- creasing shift towards specialized accelerators. Field pro- grammable gate arrays (FPGAs) are particularly promising as they offer significant performance and energy improvements compared to CPUs for a wide class of applications and are far more flexible than fixed-function ASICs. However, FPGAs are difficult to program. Traditional programmi… ▽ More

    Submitted 22 November, 2015; originally announced November 2015.