Skip to main content

Showing 1–23 of 23 results for author: Klimovic, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.01603  [pdf, other

    cs.DC cs.OS

    Unlocking True Elasticity for the Cloud-Native Era with Dandelion

    Authors: Tom Kuchler, Pinghe Li, Yazhuo Zhang, Lazar Cvetković, Boris Goranov, Tobias Stocker, Leon Thomm, Simone Kalbermatter, Tim Notter, Andrea Lattuada, Ana Klimovic

    Abstract: Elasticity is fundamental to cloud computing, as it enables quickly allocating resources to match the demand of each workload as it arrives, rather than pre-provisioning resources to meet performance objectives. However, even serverless platforms -- which boot sandboxes in 10s to 100s of milliseconds -- are not sufficiently elastic to avoid over-provisioning expensive resources. Today's FaaS platf… ▽ More

    Submitted 2 May, 2025; originally announced May 2025.

    Comments: 13 pages, 10 figures

  2. arXiv:2504.17096  [pdf, other

    cs.DC

    Sailor: Automating Distributed Training over Dynamic, Heterogeneous, and Geo-distributed Clusters

    Authors: Foteini Strati, Zhendong Zhang, George Manos, Ixeia Sánchez Périz, Qinghao Hu, Tiancheng Chen, Berk Buzcu, Song Han, Pamela Delgado, Ana Klimovic

    Abstract: The high GPU demand of ML training makes it hard to allocate large homogeneous clusters of high-end GPUs in a single availability zone. Leveraging heterogeneous GPUs available within and across zones can improve throughput at a reasonable cost. However, training ML models on heterogeneous resources introduces significant challenges, such as stragglers and a large search space of possible job confi… ▽ More

    Submitted 23 April, 2025; originally announced April 2025.

  3. arXiv:2502.19790  [pdf, other

    cs.LG cs.AI cs.DB

    Mixtera: A Data Plane for Foundation Model Training

    Authors: Maximilian Böther, Xiaozhe Yao, Tolga Kerimoglu, Dan Graur, Viktor Gsteiger, Ana Klimovic

    Abstract: State-of-the-art large language and vision models are trained over trillions of tokens that are aggregated from a large variety of sources. As training data collections grow, manually managing the samples becomes time-consuming, tedious, and prone to errors. Yet recent research shows that the data mixture and the order in which samples are visited during training can significantly influence model… ▽ More

    Submitted 3 April, 2025; v1 submitted 27 February, 2025; originally announced February 2025.

    Comments: under submission

  4. arXiv:2502.09334  [pdf, other

    cs.DC

    ThunderServe: High-performance and Cost-efficient LLM Serving in Cloud Environments

    Authors: Youhe Jiang, Fangcheng Fu, Xiaozhe Yao, Taiyi Wang, Bin Cui, Ana Klimovic, Eiko Yoneki

    Abstract: Recent developments in large language models (LLMs) have demonstrated their remarkable proficiency in a range of tasks. Compared to in-house homogeneous GPU clusters, deploying LLMs in cloud environments with diverse types of GPUs is crucial for addressing the GPU shortage problem and being more cost-effective. However, the diversity of network environments and various GPU types on the cloud bring… ▽ More

    Submitted 13 February, 2025; originally announced February 2025.

    Comments: MLSys 2025

  5. arXiv:2502.08235  [pdf, other

    cs.AI

    The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks

    Authors: Alejandro Cuadron, Dacheng Li, Wenjie Ma, Xingyao Wang, Yichuan Wang, Siyuan Zhuang, Shu Liu, Luis Gaspar Schroeder, Tian Xia, Huanzhi Mao, Nicholas Thumiger, Aditya Desai, Ion Stoica, Ana Klimovic, Graham Neubig, Joseph E. Gonzalez

    Abstract: Large Reasoning Models (LRMs) represent a breakthrough in AI problem-solving capabilities, but their effectiveness in interactive environments can be limited. This paper introduces and analyzes overthinking in LRMs. A phenomenon where models favor extended internal reasoning chains over environmental interaction. Through experiments on software engineering tasks using SWE Bench Verified, we observ… ▽ More

    Submitted 12 February, 2025; originally announced February 2025.

  6. arXiv:2502.00722  [pdf, ps, other

    cs.DC

    Demystifying Cost-Efficiency in LLM Serving over Heterogeneous GPUs

    Authors: Youhe Jiang, Fangcheng Fu, Xiaozhe Yao, Guoliang He, Xupeng Miao, Ana Klimovic, Bin Cui, Binhang Yuan, Eiko Yoneki

    Abstract: Recent advancements in Large Language Models (LLMs) have led to increasingly diverse requests, accompanied with varying resource (compute and memory) demands to serve them. However, this in turn degrades the cost-efficiency of LLM serving as common practices primarily rely on homogeneous GPU resources. In response to this problem, this work conducts a thorough study about serving LLMs over heterog… ▽ More

    Submitted 5 June, 2025; v1 submitted 2 February, 2025; originally announced February 2025.

  7. arXiv:2501.16909  [pdf, other

    cs.DC

    Measuring GPU utilization one level deeper

    Authors: Paul Elvinger, Foteini Strati, Natalie Enright Jerger, Ana Klimovic

    Abstract: GPU hardware is vastly underutilized. Even resource-intensive AI applications have diverse resource profiles that often leave parts of GPUs idle. While colocating applications can improve utilization, current spatial sharing systems lack performance guarantees. Providing predictable performance guarantees requires a deep understanding of how applications contend for shared GPU resources such as bl… ▽ More

    Submitted 12 February, 2025; v1 submitted 28 January, 2025; originally announced January 2025.

  8. arXiv:2407.00839  [pdf, ps, other

    cs.DC cs.NI cs.OS

    Imaginary Machines: A Serverless Model for Cloud Applications

    Authors: Michael Wawrzoniak, Rodrigo Bruno, Ana Klimovic, Gustavo Alonso

    Abstract: Serverless Function-as-a-Service (FaaS) platforms provide applications with resources that are highly elastic, quick to instantiate, accounted at fine granularity, and without the need for explicit runtime resource orchestration. This combination of the core properties underpins the success and popularity of the serverless FaaS paradigm. However, these benefits are not available to most cloud appl… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  9. arXiv:2407.00832  [pdf, other

    cs.DC cs.NI cs.OS

    Boxer: FaaSt Ephemeral Elasticity for Off-the-Shelf Cloud Applications

    Authors: Michael Wawrzoniak, Rodrigo Bruno, Ana Klimovic, Gustavo Alonso

    Abstract: Elasticity is a key property of cloud computing. However, elasticity is offered today at the granularity of virtual machines, which take tens of seconds to start. This is insufficient to react to load spikes and sudden failures in latency sensitive applications, leading users to resort to expensive overprovisioning. Function-as-a-Service (FaaS) provides significantly higher elasticity than VMs, bu… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  10. Dirigent: Lightweight Serverless Orchestration

    Authors: Lazar Cvetković, François Costa, Mihajlo Djokic, Michal Friedman, Ana Klimovic

    Abstract: While Function as a Service (FaaS) platforms can initialize function sandboxes on worker nodes in 10-100s of milliseconds, the latency to schedule functions in real FaaS clusters can be orders of magnitude higher. The current approach of building FaaS cluster managers on top of legacy orchestration systems (e.g., Kubernetes) leads to high scheduling delays when clusters experience high sandbox chu… ▽ More

    Submitted 28 October, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  11. arXiv:2403.01876  [pdf, other

    cs.DC

    DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving

    Authors: Foteini Strati, Sara Mcallister, Amar Phanishayee, Jakub Tarnawski, Ana Klimovic

    Abstract: Distributed LLM serving is costly and often underutilizes hardware accelerators due to three key challenges: bubbles in pipeline-parallel deployments caused by the bimodal latency of prompt and token processing, GPU memory overprovisioning, and long recovery times in case of failures. In this paper, we propose DéjàVu, a system to address all these challenges using a versatile and efficient KV cach… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  12. arXiv:2402.16442  [pdf, other

    cs.LG cs.AI cs.CV cs.DC math.OC

    On Distributed Larger-Than-Memory Subset Selection With Pairwise Submodular Functions

    Authors: Maximilian Böther, Abraham Sebastian, Pranjal Awasthi, Ana Klimovic, Srikumar Ramalingam

    Abstract: Modern datasets span billions of samples, making training on all available data infeasible. Selecting a high quality subset helps in reducing training costs and enhancing model quality. Submodularity, a discrete analogue of convexity, is commonly used for solving such subset selection problems. However, existing algorithms for optimizing submodular functions are sequential, and the prior distribut… ▽ More

    Submitted 3 April, 2025; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: accepted at MLSys 2025

  13. arXiv:2312.06254  [pdf, other

    cs.LG cs.AI cs.DB cs.DC stat.ML

    Modyn: Data-Centric Machine Learning Pipeline Orchestration

    Authors: Maximilian Böther, Ties Robroek, Viktor Gsteiger, Robin Holzinger, Xianzhe Ma, Pınar Tözün, Ana Klimovic

    Abstract: In real-world machine learning (ML) pipelines, datasets are continuously growing. Models must incorporate this new training data to improve generalization and adapt to potential distribution shifts. The cost of model retraining is proportional to how frequently the model is retrained and how much data it is trained on, which makes the naive approach of retraining from scratch each time impractical… ▽ More

    Submitted 24 January, 2025; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: final version published at SIGMOD'25; 30 pages

  14. arXiv:2312.05215  [pdf, other

    cs.DC cs.LG

    DeltaZip: Efficient Serving of Multiple Full-Model-Tuned LLMs

    Authors: Xiaozhe Yao, Qinghao Hu, Ana Klimovic

    Abstract: Fine-tuning large language models (LLMs) greatly improves model quality for downstream tasks. However, serving many fine-tuned LLMs concurrently is challenging due to the sporadic, bursty, and varying request patterns of different LLMs. To bridge this gap, we present DeltaZip, an LLM serving system that efficiently serves multiple full-parameter fine-tuned models concurrently by aggressively compr… ▽ More

    Submitted 25 March, 2025; v1 submitted 8 December, 2023; originally announced December 2023.

    Comments: EuroSys 2025'

  15. tf.data service: A Case for Disaggregating ML Input Data Processing

    Authors: Andrew Audibert, Yang Chen, Dan Graur, Ana Klimovic, Jiri Simsa, Chandramohan A. Thekkath

    Abstract: Machine learning (ML) computations commonly execute on expensive specialized hardware, such as GPUs and TPUs, which provide high FLOPs and performance-per-watt. For cost efficiency, it is essential to keep these accelerators highly utilized. This requires preprocessing input data at the rate at which the accelerators can ingest and perform ML computations on the data. To avoid data stalls, the hos… ▽ More

    Submitted 2 January, 2024; v1 submitted 26 October, 2022; originally announced October 2022.

  16. arXiv:2205.11261  [pdf, other

    cs.DC cs.DB

    An Elastic Ephemeral Datastore using Cheap, Transient Cloud Resources

    Authors: Malte Brodmann, Nikolas Ioannou, Bernard Metzler, Jonas Pfefferle, Ana Klimovic

    Abstract: Spot instances are virtual machines offered at 60-90% lower cost that can be reclaimed at any time, with only a short warning period. Spot instances have already been used to significantly reduce the cost of processing workloads in the cloud. However, leveraging spot instances to reduce the cost of stateful cloud applications is much more challenging, as the sudden preemptions lead to data loss. I… ▽ More

    Submitted 23 May, 2022; originally announced May 2022.

  17. arXiv:2204.01457  [pdf, other

    cs.LG cs.DB

    SHiFT: An Efficient, Flexible Search Engine for Transfer Learning

    Authors: Cedric Renggli, Xiaozhe Yao, Luka Kolar, Luka Rimanic, Ana Klimovic, Ce Zhang

    Abstract: Transfer learning can be seen as a data- and compute-efficient alternative to training models from scratch. The emergence of rich model repositories, such as TensorFlow Hub, enables practitioners and researchers to unleash the potential of these models across a wide range of downstream tasks. As these repositories keep growing exponentially, efficiently selecting a good model for the task at hand… ▽ More

    Submitted 28 September, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

  18. arXiv:2202.06646  [pdf, other

    cs.DC

    Short-lived Datacenter

    Authors: Michael Wawrzoniak, Ingo Müller, Rodrigo Bruno, Ana Klimovic, Gustavo Alonso

    Abstract: Serverless platforms have attracted attention due to their promise of elasticity, low cost, and fast deployment. Instead of using a fixed virtual machine (VM) infrastructure, which can incur considerable costs to operate and run, serverless platforms support short computations, triggered on demand, with cost proportional to fine-grain function execution time. However, serverless platforms offer a… ▽ More

    Submitted 14 February, 2022; originally announced February 2022.

  19. arXiv:2112.00425  [pdf, other

    cs.DB

    How to use Persistent Memory in your Database

    Authors: Dimitrios Koutsoukos, Raghav Bhartia, Ana Klimovic, Gustavo Alonso

    Abstract: Persistent or Non Volatile Memory (PMEM or NVM) has recently become commercially available under several configurations with different purposes and goals. Despite the attention to the topic, we are not aware of a comprehensive empirical analysis of existing relational database engines under different PMEM configurations. Such a study is important to understand the performance implications of the v… ▽ More

    Submitted 1 December, 2021; originally announced December 2021.

  20. arXiv:2111.04131  [pdf, other

    cs.LG cs.PF

    Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines

    Authors: Michael Kuchnik, Ana Klimovic, Jiri Simsa, Virginia Smith, George Amvrosiadis

    Abstract: Input pipelines, which ingest and transform input data, are an essential part of training Machine Learning (ML) models. However, it is challenging to implement efficient input pipelines, as it requires reasoning about parallelism, asynchrony, and variability in fine-grained profiling information. Our analysis of over two million ML jobs in Google datacenters reveals that a significant fraction of… ▽ More

    Submitted 21 March, 2022; v1 submitted 7 November, 2021; originally announced November 2021.

  21. Towards Demystifying Serverless Machine Learning Training

    Authors: Jiawei Jiang, Shaoduo Gan, Yue Liu, Fanlin Wang, Gustavo Alonso, Ana Klimovic, Ankit Singla, Wentao Wu, Ce Zhang

    Abstract: The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-intensive applications such as ETL, query processing, or machine learning (ML). Several systems exist for training large-scale ML models on top of serverless infrastructures (e.g., AWS Lambda) but with inconclusive results in terms of their performance and relative advantage over "serverful" infrastructures (… ▽ More

    Submitted 17 May, 2021; originally announced May 2021.

  22. arXiv:2101.12127  [pdf, other

    cs.LG cs.MS

    tf.data: A Machine Learning Data Processing Framework

    Authors: Derek G. Murray, Jiri Simsa, Ana Klimovic, Ihor Indyk

    Abstract: Training machine learning models requires feeding input data for models to ingest. Input pipelines for machine learning jobs are often challenging to implement efficiently as they require reading large volumes of data, applying complex transformations, and transferring data to hardware accelerators while overlapping computation and communication to achieve optimal performance. We present tf.data,… ▽ More

    Submitted 23 February, 2021; v1 submitted 28 January, 2021; originally announced January 2021.

  23. arXiv:2004.03488  [pdf, other

    cs.DB

    Modularis: Modular Relational Analytics over Heterogeneous Distributed Platforms

    Authors: Dimitrios Koutsoukos, Ingo Müller, Renato Marroquín, Ana Klimovic, Gustavo Alonso

    Abstract: The enormous quantity of data produced every day together with advances in data analytics has led to a proliferation of data management and analysis systems. Typically, these systems are built around highly specialized monolithic operators optimized for the underlying hardware. While effective in the short term, such an approach makes the operators cumbersome to port and adapt, which is increasing… ▽ More

    Submitted 29 September, 2021; v1 submitted 7 April, 2020; originally announced April 2020.

    Comments: Accepted at PVLDB vol. 14