-
Solving Max-Min Fair Resource Allocations Quickly on Large Graphs
Authors:
Pooria Namyar,
Behnaz Arzani,
Srikanth Kandula,
Santiago Segarra,
Daniel Crankshaw,
Umesh Krishnaswamy,
Ramesh Govindan,
Himanshu Raj
Abstract:
We consider the max-min fair resource allocation problem. The best-known solutions use either a sequence of optimizations or waterfilling, which only applies to a narrow set of cases. These solutions have become a practical bottleneck in WAN traffic engineering and cluster scheduling, especially at larger problem sizes. We improve both approaches: (1) we show how to convert the optimization sequen…
▽ More
We consider the max-min fair resource allocation problem. The best-known solutions use either a sequence of optimizations or waterfilling, which only applies to a narrow set of cases. These solutions have become a practical bottleneck in WAN traffic engineering and cluster scheduling, especially at larger problem sizes. We improve both approaches: (1) we show how to convert the optimization sequence into a single fast optimization, and (2) we generalize waterfilling to the multi-path case. We empirically show our new algorithms Pareto-dominate prior techniques: they produce faster, fairer, and more efficient allocations. Some of our allocators also have theoretical guarantees: they trade off a bounded amount of unfairness for faster allocation. We have deployed our allocators in Azure's WAN traffic engineering pipeline, where we preserve solution quality and achieve a roughly $3\times$ speedup.
△ Less
Submitted 14 October, 2023;
originally announced October 2023.
-
Mitigating the Performance Impact of Network Failures in Public Clouds
Authors:
Pooria Namyar,
Behnaz Arzani,
Daniel Crankshaw,
Daniel S. Berger,
Kevin Hsieh,
Srikanth Kandula,
Ramesh Govindan
Abstract:
Some faults in data center networks require hours to days to repair because they may need reboots, re-imaging, or manual work by technicians. To reduce traffic impact, cloud providers \textit{mitigate} the effect of faults, for example, by steering traffic to alternate paths. The state-of-art in automatic network mitigations uses simple safety checks and proxy metrics to determine mitigations. SWA…
▽ More
Some faults in data center networks require hours to days to repair because they may need reboots, re-imaging, or manual work by technicians. To reduce traffic impact, cloud providers \textit{mitigate} the effect of faults, for example, by steering traffic to alternate paths. The state-of-art in automatic network mitigations uses simple safety checks and proxy metrics to determine mitigations. SWARM, the approach described in this paper, can pick orders of magnitude better mitigations by estimating end-to-end connection-level performance (CLP) metrics. At its core is a scalable CLP estimator that quickly ranks mitigations with high fidelity and, on failures observed at a large cloud provider, outperforms the state-of-the-art by over 700$\times$ in some cases.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
SOL: Safe On-Node Learning in Cloud Platforms
Authors:
Yawen Wang,
Daniel Crankshaw,
Neeraja J. Yadwadkar,
Daniel Berger,
Christos Kozyrakis,
Ricardo Bianchini
Abstract:
Cloud platforms run many software agents on each server node. These agents manage all aspects of node operation, and in some cases frequently collect data and make decisions. Unfortunately, their behavior is typically based on pre-defined static heuristics or offline analysis; they do not leverage on-node machine learning (ML). In this paper, we first characterize the spectrum of node agents in Az…
▽ More
Cloud platforms run many software agents on each server node. These agents manage all aspects of node operation, and in some cases frequently collect data and make decisions. Unfortunately, their behavior is typically based on pre-defined static heuristics or offline analysis; they do not leverage on-node machine learning (ML). In this paper, we first characterize the spectrum of node agents in Azure, and identify the classes of agents that are most likely to benefit from on-node ML. We then propose SOL, an extensible framework for designing ML-based agents that are safe and robust to the range of failure conditions that occur in production. SOL provides a simple API to agent developers and manages the scheduling and running of the agent-specific functions they write. We illustrate the use of SOL by implementing three ML-based agents that manage CPU cores, node power, and memory placement. Our experiments show that (1) ML substantially improves our agents, and (2) SOL ensures that agents operate safely under a variety of failure conditions. We conclude that ML-based agents show significant potential and that SOL can help build them.
△ Less
Submitted 25 January, 2022;
originally announced January 2022.
-
InferLine: ML Prediction Pipeline Provisioning and Management for Tight Latency Objectives
Authors:
Daniel Crankshaw,
Gur-Eyal Sela,
Corey Zumar,
Xiangxi Mo,
Joseph E. Gonzalez,
Ion Stoica,
Alexey Tumanov
Abstract:
Serving ML prediction pipelines spanning multiple models and hardware accelerators is a key challenge in production machine learning. Optimally configuring these pipelines to meet tight end-to-end latency goals is complicated by the interaction between model batch size, the choice of hardware accelerator, and variation in the query arrival process.
In this paper we introduce InferLine, a system…
▽ More
Serving ML prediction pipelines spanning multiple models and hardware accelerators is a key challenge in production machine learning. Optimally configuring these pipelines to meet tight end-to-end latency goals is complicated by the interaction between model batch size, the choice of hardware accelerator, and variation in the query arrival process.
In this paper we introduce InferLine, a system which provisions and manages the individual stages of prediction pipelines to meet end-to-end tail latency constraints while minimizing cost. InferLine consists of a low-frequency combinatorial planner and a high-frequency auto-scaling tuner. The low-frequency planner leverages stage-wise profiling, discrete event simulation, and constrained combinatorial search to automatically select hardware type, replication, and batching parameters for each stage in the pipeline. The high-frequency tuner uses network calculus to auto-scale each stage to meet tail latency goals in response to changes in the query arrival process. We demonstrate that InferLine outperforms existing approaches by up to 7.6x in cost while achieving up to 34.5x lower latency SLO miss rate on realistic workloads and generalizes across state-of-the-art model serving frameworks.
△ Less
Submitted 3 August, 2020; v1 submitted 4 December, 2018;
originally announced December 2018.
-
Composing Meta-Policies for Autonomous Driving Using Hierarchical Deep Reinforcement Learning
Authors:
Richard Liaw,
Sanjay Krishnan,
Animesh Garg,
Daniel Crankshaw,
Joseph E. Gonzalez,
Ken Goldberg
Abstract:
Rather than learning new control policies for each new task, it is possible, when tasks share some structure, to compose a "meta-policy" from previously learned policies. This paper reports results from experiments using Deep Reinforcement Learning on a continuous-state, discrete-action autonomous driving simulator. We explore how Deep Neural Networks can represent meta-policies that switch among…
▽ More
Rather than learning new control policies for each new task, it is possible, when tasks share some structure, to compose a "meta-policy" from previously learned policies. This paper reports results from experiments using Deep Reinforcement Learning on a continuous-state, discrete-action autonomous driving simulator. We explore how Deep Neural Networks can represent meta-policies that switch among a set of previously learned policies, specifically in settings where the dynamics of a new scenario are composed of a mixture of previously learned dynamics and where the state observation is possibly corrupted by sensing noise. We also report the results of experiments varying dynamics mixes, distractor policies, magnitudes/distributions of sensing noise, and obstacles. In a fully observed experiment, the meta-policy learning algorithm achieves 2.6x the reward achieved by the next best policy composition technique with 80% less exploration. In a partially observed experiment, the meta-policy learning algorithm converges after 50 iterations while a direct application of RL fails to converge even after 200 iterations.
△ Less
Submitted 4 November, 2017;
originally announced November 2017.
-
IDK Cascades: Fast Deep Learning by Learning not to Overthink
Authors:
Xin Wang,
Yujia Luo,
Daniel Crankshaw,
Alexey Tumanov,
Fisher Yu,
Joseph E. Gonzalez
Abstract:
Advances in deep learning have led to substantial increases in prediction accuracy but have been accompanied by increases in the cost of rendering predictions. We conjecture that fora majority of real-world inputs, the recent advances in deep learning have created models that effectively "overthink" on simple inputs. In this paper, we revisit the classic question of building model cascades that pr…
▽ More
Advances in deep learning have led to substantial increases in prediction accuracy but have been accompanied by increases in the cost of rendering predictions. We conjecture that fora majority of real-world inputs, the recent advances in deep learning have created models that effectively "overthink" on simple inputs. In this paper, we revisit the classic question of building model cascades that primarily leverage class asymmetry to reduce cost. We introduce the "I Don't Know"(IDK) prediction cascades framework, a general framework to systematically compose a set of pre-trained models to accelerate inference without a loss in prediction accuracy. We propose two search based methods for constructing cascades as well as a new cost-aware objective within this framework. The proposed IDK cascade framework can be easily adopted in the existing model serving systems without additional model re-training. We evaluate the proposed techniques on a range of benchmarks to demonstrate the effectiveness of the proposed framework.
△ Less
Submitted 27 June, 2018; v1 submitted 2 June, 2017;
originally announced June 2017.
-
Clipper: A Low-Latency Online Prediction Serving System
Authors:
Daniel Crankshaw,
Xin Wang,
Giulio Zhou,
Michael J. Franklin,
Joseph E. Gonzalez,
Ion Stoica
Abstract:
Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy query load. However, most machine learning frameworks and systems only address model training and not deployment.
In this paper, we introduce Clipper, a general-purpose low-latency prediction serving system. Interposing between end-user applications and a wi…
▽ More
Machine learning is being deployed in a growing number of applications which demand real-time, accurate, and robust predictions under heavy query load. However, most machine learning frameworks and systems only address model training and not deployment.
In this paper, we introduce Clipper, a general-purpose low-latency prediction serving system. Interposing between end-user applications and a wide range of machine learning frameworks, Clipper introduces a modular architecture to simplify model deployment across frameworks and applications. Furthermore, by introducing caching, batching, and adaptive model selection techniques, Clipper reduces prediction latency and improves prediction throughput, accuracy, and robustness without modifying the underlying machine learning frameworks. We evaluate Clipper on four common machine learning benchmark datasets and demonstrate its ability to meet the latency, accuracy, and throughput demands of online serving applications. Finally, we compare Clipper to the TensorFlow Serving system and demonstrate that we are able to achieve comparable throughput and latency while enabling model composition and online learning to improve accuracy and render more robust predictions.
△ Less
Submitted 28 February, 2017; v1 submitted 9 December, 2016;
originally announced December 2016.
-
The Missing Piece in Complex Analytics: Low Latency, Scalable Model Management and Serving with Velox
Authors:
Daniel Crankshaw,
Peter Bailis,
Joseph E. Gonzalez,
Haoyuan Li,
Zhao Zhang,
Michael J. Franklin,
Ali Ghodsi,
Michael I. Jordan
Abstract:
To support complex data-intensive applications such as personalized recommendations, targeted advertising, and intelligent services, the data management community has focused heavily on the design of systems to support training complex models on large datasets. Unfortunately, the design of these systems largely ignores a critical component of the overall analytics process: the deployment and servi…
▽ More
To support complex data-intensive applications such as personalized recommendations, targeted advertising, and intelligent services, the data management community has focused heavily on the design of systems to support training complex models on large datasets. Unfortunately, the design of these systems largely ignores a critical component of the overall analytics process: the deployment and serving of models at scale. In this work, we present Velox, a new component of the Berkeley Data Analytics Stack. Velox is a data management system for facilitating the next steps in real-world, large-scale analytics pipelines: online model management, maintenance, and serving. Velox provides end-user applications and services with a low-latency, intuitive interface to models, transforming the raw statistical models currently trained using existing offline large-scale compute frameworks into full-blown, end-to-end data products capable of recommending products, targeting advertisements, and personalizing web content. To provide up-to-date results for these complex models, Velox also facilitates lightweight online model maintenance and selection (i.e., dynamic weighting). In this paper, we describe the challenges and architectural considerations required to achieve this functionality, including the abilities to span online and offline systems, to adaptively adjust model materialization strategies, and to exploit inherent statistical properties such as model error tolerance, all while operating at "Big Data" scale.
△ Less
Submitted 1 December, 2014; v1 submitted 12 September, 2014;
originally announced September 2014.
-
GraphX: Unifying Data-Parallel and Graph-Parallel Analytics
Authors:
Reynold S. Xin,
Daniel Crankshaw,
Ankur Dave,
Joseph E. Gonzalez,
Michael J. Franklin,
Ion Stoica
Abstract:
From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster tha…
▽ More
From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance gains also make it difficult to express many of the important stages in a typical graph-analytics pipeline: constructing the graph, modifying its structure, or expressing computation that spans multiple graphs. As a consequence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model.
To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. GraphX provides a small, core set of graph-parallel operators expressive enough to implement the Pregel and PowerGraph abstractions, yet simple enough to be cast in relational algebra. GraphX uses a collection of query optimization techniques such as automatic join rewrites to efficiently implement these graph-parallel operators. We evaluate GraphX on real-world graphs and workloads and demonstrate that GraphX achieves comparable performance as specialized graph computation systems, while outperforming them in end-to-end graph pipelines. Moreover, GraphX achieves a balance between expressiveness, performance, and ease of use.
△ Less
Submitted 11 February, 2014;
originally announced February 2014.
-
Probing Decoherence with Electromagnetically Induced Transparency in Superconductive Quantum Circuits
Authors:
K. V. R. M. Murali,
D. S. Crankshaw,
T. P. Orlando,
Z. Dutton,
W. D. Oliver
Abstract:
Superconductive quantum circuits (SQCs) comprise quantized energy levels that may be coupled via microwave electromagnetic fields. Described in this way, one may draw a close analogy to atoms with internal (electronic) levels coupled by laser light fields. In this Letter, we present a superconductive analog to electromagnetically induced transparency (S-EIT) that utilizes SQC designs of present…
▽ More
Superconductive quantum circuits (SQCs) comprise quantized energy levels that may be coupled via microwave electromagnetic fields. Described in this way, one may draw a close analogy to atoms with internal (electronic) levels coupled by laser light fields. In this Letter, we present a superconductive analog to electromagnetically induced transparency (S-EIT) that utilizes SQC designs of present day experimental consideration. We discuss how S-EIT can be used to establish macroscopic coherence in such systems and, thereby, utilized as a sensitive probe of decoherence.
△ Less
Submitted 20 November, 2003;
originally announced November 2003.
-
DC measurements of macroscopic quantum levels in a superconducting qubit structure with a time-ordered meter
Authors:
D. S. Crankshaw,
K. Segall,
D. Nakada,
T. P. Orlando,
L. S. Levitov,
S. Lloyd,
S. O. Valenzuela,
N. Markovic,
M. Tinkham,
K. K. Berggren
Abstract:
DC measurements are made in a superconducting, persistent current qubit structure with a time-ordered meter. The persistent-current qubit has a double-well potential, with the two minima corresponding to magnetization states of opposite sign. Macroscopic resonant tunneling between the two wells is observed at values of energy bias that correspond to the positions of the calculated quantum levels…
▽ More
DC measurements are made in a superconducting, persistent current qubit structure with a time-ordered meter. The persistent-current qubit has a double-well potential, with the two minima corresponding to magnetization states of opposite sign. Macroscopic resonant tunneling between the two wells is observed at values of energy bias that correspond to the positions of the calculated quantum levels. The magnetometer, a Superconducting Quantum Interference Device (SQUID), detects the state of the qubit in a time-ordered fashion, measuring one state before the other. This results in a different meter output depending on the initial state, providing different signatures of the energy levels for each tunneling direction. From these measurements, the intrawell relaxation time is found to be about 50 microseconds.
△ Less
Submitted 20 November, 2003; v1 submitted 19 November, 2003;
originally announced November 2003.
-
Energy Relaxation Time between Macroscopic Quantum Levels in a Superconducting Persistent Current Qubit
Authors:
Yang Yu,
D. Nakada,
Janice C. Lee,
Bhuwan Singh,
D. S. Crankshaw,
T. P. Orlando,
William D. Oliver,
Karl K. Berggren
Abstract:
We measured the intrawell energy relaxation time τ_{d} between macroscopic quantum levels in the double well potential of a Nb persistent-current qubit. Interwell population transitions were generated by irradiating the qubit with microwaves. Zero population in the initial well was then observed due to a multi-level decay process in which the initial population relaxed to the lower energy levels…
▽ More
We measured the intrawell energy relaxation time τ_{d} between macroscopic quantum levels in the double well potential of a Nb persistent-current qubit. Interwell population transitions were generated by irradiating the qubit with microwaves. Zero population in the initial well was then observed due to a multi-level decay process in which the initial population relaxed to the lower energy levels during transitions. The qubit's decoherence time, determined from τ_{d}, is longer than 20 microseconds, holding the promise of building a quantum computer with Nb-based superconducting qubits.
△ Less
Submitted 13 November, 2003; v1 submitted 12 November, 2003;
originally announced November 2003.
-
Impact of time-ordered measurements of the two states in a niobium superconducting qubit structure
Authors:
K. Segall,
D. Crankshaw,
D. Nakada,
T. P. Orlando,
L. S. Levitov,
S. Lloyd,
N. Markovic,
S. O. Valenzuela,
M. Tinkham,
K. K. Berggren
Abstract:
Measurements of thermal activation are made in a superconducting, niobium Persistent-Current (PC) qubit structure, which has two stable classical states of equal and opposite circulating current. The magnetization signal is read out by ramping the bias current of a DC SQUID. This ramping causes time-ordered measurements of the two states, where measurement of one state occurs before the other. T…
▽ More
Measurements of thermal activation are made in a superconducting, niobium Persistent-Current (PC) qubit structure, which has two stable classical states of equal and opposite circulating current. The magnetization signal is read out by ramping the bias current of a DC SQUID. This ramping causes time-ordered measurements of the two states, where measurement of one state occurs before the other. This time-ordering results in an effective measurement time, which can be used to probe the thermal activation rate between the two states. Fitting the magnetization signal as a function of temperature and ramp time allows one to estimate a quality factor of 10^6 for our devices, a value favorable for the observation of long quantum coherence times at lower temperatures.
△ Less
Submitted 26 February, 2003;
originally announced February 2003.