-
Turning AI Data Centers into Grid-Interactive Assets: Results from a Field Demonstration in Phoenix, Arizona
Authors:
Philip Colangelo,
Ayse K. Coskun,
Jack Megrue,
Ciaran Roberts,
Shayan Sengupta,
Varun Sivaram,
Ethan Tiao,
Aroon Vijaykar,
Chris Williams,
Daniel C. Wilson,
Zack MacFarland,
Daniel Dreiling,
Nathan Morey,
Anuja Ratnayake,
Baskar Vairamohan
Abstract:
Artificial intelligence (AI) is fueling exponential electricity demand growth, threatening grid reliability, raising prices for communities paying for new energy infrastructure, and stunting AI innovation as data centers wait for interconnection to constrained grids. This paper presents the first field demonstration, in collaboration with major corporate partners, of a software-only approach--Emer…
▽ More
Artificial intelligence (AI) is fueling exponential electricity demand growth, threatening grid reliability, raising prices for communities paying for new energy infrastructure, and stunting AI innovation as data centers wait for interconnection to constrained grids. This paper presents the first field demonstration, in collaboration with major corporate partners, of a software-only approach--Emerald Conductor--that transforms AI data centers into flexible grid resources that can efficiently and immediately harness existing power systems without massive infrastructure buildout. Conducted at a 256-GPU cluster running representative AI workloads within a commercial, hyperscale cloud data center in Phoenix, Arizona, the trial achieved a 25% reduction in cluster power usage for three hours during peak grid events while maintaining AI quality of service (QoS) guarantees. By orchestrating AI workloads based on real-time grid signals without hardware modifications or energy storage, this platform reimagines data centers as grid-interactive assets that enhance grid reliability, advance affordability, and accelerate AI's development.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
An Online Probabilistic Distributed Tracing System
Authors:
M. Toslali,
S. Qasim,
S. Parthasarathy,
F. A. Oliveira,
H. Huang,
G. Stringhini,
Z. Liu,
A. K. Coskun
Abstract:
Distributed tracing has become a fundamental tool for diagnosing performance issues in the cloud by recording causally ordered, end-to-end workflows of request executions. However, tracing in production workloads can introduce significant overheads due to the extensive instrumentation needed for identifying performance variations. This paper addresses the trade-off between the cost of tracing and…
▽ More
Distributed tracing has become a fundamental tool for diagnosing performance issues in the cloud by recording causally ordered, end-to-end workflows of request executions. However, tracing in production workloads can introduce significant overheads due to the extensive instrumentation needed for identifying performance variations. This paper addresses the trade-off between the cost of tracing and the utility of the "spans" within that trace through Astraea, an online probabilistic distributed tracing system. Astraea is based on our technique that combines online Bayesian learning and multi-armed bandit frameworks. This formulation enables Astraea to effectively steer tracing towards the useful instrumentation needed for accurate performance diagnosis. Astraea localizes performance variations using only 10-28% of available instrumentation, markedly reducing tracing overhead, storage, compute costs, and trace analysis time.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
A New Dataflow Implementation to Improve Energy Efficiency of Monolithic 3D Systolic Arrays
Authors:
Prachi Shukla,
Vasilis F. Pavlidis,
Emre Salman,
Ayse K. Coskun
Abstract:
Systolic arrays are popular for executing deep neural networks (DNNs) at the edge. Low latency and energy efficiency are key requirements in edge devices such as drones and autonomous vehicles. Monolithic 3D (MONO3D) is an emerging 3D integration technique that offers ultra-high bandwidth among processing and memory elements with a negligible area overhead. Such high bandwidth can help meet the ev…
▽ More
Systolic arrays are popular for executing deep neural networks (DNNs) at the edge. Low latency and energy efficiency are key requirements in edge devices such as drones and autonomous vehicles. Monolithic 3D (MONO3D) is an emerging 3D integration technique that offers ultra-high bandwidth among processing and memory elements with a negligible area overhead. Such high bandwidth can help meet the ever-growing latency and energy efficiency demands for DNNs. This paper presents a novel implementation for weight stationary (WS) dataflow in MONO3D systolic arrays, called WS-MONO3D. WS-MONO3D utilizes multiple resistive RAM layers and SRAM with high-density vertical interconnects to multicast inputs and perform high-bandwidth weight pre-loading while maintaining the same order of multiply-and-accumulate operations as in native WS dataflow. Consequently, WS-MONO3D eliminates input and weight forwarding cycles and, thus, provides up to 40% improvement in energy-delay-product (EDP) over the native WS implementation in 2D at iso-configuration. WS-MONO3D also provides 10X improvement in inference per second per watt per footprint due to multiple vertical tiers. Finally, we also show that temperature impacts the energy efficiency benefits in WS-MONO3D.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Temperature-Aware Monolithic 3D DNN Accelerators for Biomedical Applications
Authors:
Prachi Shukla,
Vasilis F. Pavlidis,
Emre Salman,
Ayse K. Coskun
Abstract:
In this paper, we focus on temperature-aware Monolithic 3D (Mono3D) deep neural network (DNN) inference accelerators for biomedical applications. We develop an optimizer that tunes aspect ratios and footprint of the accelerator under user-defined performance and thermal constraints, and generates near-optimal configurations. Using the proposed Mono3D optimizer, we demonstrate up to 61% improvement…
▽ More
In this paper, we focus on temperature-aware Monolithic 3D (Mono3D) deep neural network (DNN) inference accelerators for biomedical applications. We develop an optimizer that tunes aspect ratios and footprint of the accelerator under user-defined performance and thermal constraints, and generates near-optimal configurations. Using the proposed Mono3D optimizer, we demonstrate up to 61% improvement in energy efficiency for biomedical applications over a performance-optimized accelerator.
△ Less
Submitted 29 March, 2022;
originally announced March 2022.
-
Architecting Optically-Controlled Phase Change Memory
Authors:
Aditya Narayan,
Yvain Thonnart,
Pascal Vivet,
Ayse K. Coskun,
Ajay Joshi
Abstract:
Phase Change Memory (PCM) is an attractive candidate for main memory as it offers non-volatility and zero leakage power, while providing higher cell densities, longer data retention time, and higher capacity scaling compared to DRAM. In PCM, data is stored in the crystalline or amorphous state of the phase change material. The typical electrically-controlled PCM (EPCM), however, suffers from longe…
▽ More
Phase Change Memory (PCM) is an attractive candidate for main memory as it offers non-volatility and zero leakage power, while providing higher cell densities, longer data retention time, and higher capacity scaling compared to DRAM. In PCM, data is stored in the crystalline or amorphous state of the phase change material. The typical electrically-controlled PCM (EPCM), however, suffers from longer write latency and higher write energy compared to DRAM and limited multi-level cell (MLC) capacities. These challenges limit the performance of data-intensive applications running on computing systems with EPCMs.
Recently, researchers demonstrated optically-controlled PCM (OPCM) cells, with support for 5 bits/cell in contrast to 2 bits/cell in EPCM. These OPCM cells can be accessed directly with optical signals that are multiplexed in high-bandwidth-density silicon-photonic links. The higher MLC capacity in OPCM and the direct cell access using optical signals enable an increased read/write throughput and lower energy per access than EPCM. However, due to the direct cell access using optical signals, OPCM systems cannot be designed using conventional memory architecture. We need a complete redesign of the memory architecture that is tailored to the properties of OPCM technology.
This paper presents the design of a unified network and main memory system called COSMOS that combines OPCM and silicon-photonic links to achieve high memory throughput. COSMOS is composed of a hierarchical multi-banked OPCM array with novel read and write access protocols, and uses an Electrical-Optical-Electrical (E-O-E) control unit to interface with the processor. Our evaluation of a 2.5D-integrated system containing a processor and COSMOS demonstrates 2.14x average speedup compared to an EPCM system. COSMOS consumes 3.8x lower read energy-per-bit and 5.97x lower write energy-per-bit compared to EPCM.
△ Less
Submitted 23 July, 2021;
originally announced July 2021.
-
Counterfactual Explanations for Machine Learning on Multivariate Time Series Data
Authors:
Emre Ates,
Burak Aksar,
Vitus J. Leung,
Ayse K. Coskun
Abstract:
Applying machine learning (ML) on multivariate time series data has growing popularity in many application domains, including in computer system management. For example, recent high performance computing (HPC) research proposes a variety of ML frameworks that use system telemetry data in the form of multivariate time series so as to detect performance variations, perform intelligent scheduling or…
▽ More
Applying machine learning (ML) on multivariate time series data has growing popularity in many application domains, including in computer system management. For example, recent high performance computing (HPC) research proposes a variety of ML frameworks that use system telemetry data in the form of multivariate time series so as to detect performance variations, perform intelligent scheduling or node allocation, and improve system security. Common barriers for adoption for these ML frameworks include the lack of user trust and the difficulty of debugging. These barriers need to be overcome to enable the widespread adoption of ML frameworks in production systems. To address this challenge, this paper proposes a novel explainability technique for providing counterfactual explanations for supervised ML frameworks that use multivariate time series data. The proposed method outperforms state-of-the-art explainability methods on several different ML frameworks and data sets in metrics such as faithfulness and robustness. The paper also demonstrates how the proposed method can be used to debug ML frameworks and gain a better understanding of HPC system telemetry data.
△ Less
Submitted 24 August, 2020;
originally announced August 2020.
-
ConfEx: A Framework for Automating Text-based Software Configuration Analysis in the Cloud
Authors:
Ozan Tuncer,
Anthony Byrne,
Nilton Bila,
Sastry Duri,
Canturk Isci,
Ayse K. Coskun
Abstract:
Modern cloud services have complex architectures, often comprising many software components, and depend on hundreds of configurations parameters to function correctly, securely, and with high performance. Due to the prevalence of open-source software, developers can easily deploy services using third-party software without mastering the configurations of that software. As a result, configuration e…
▽ More
Modern cloud services have complex architectures, often comprising many software components, and depend on hundreds of configurations parameters to function correctly, securely, and with high performance. Due to the prevalence of open-source software, developers can easily deploy services using third-party software without mastering the configurations of that software. As a result, configuration errors (i.e., misconfigurations) are among the leading causes of service disruptions and outages. While existing cloud automation tools ease the process of service deployment and management, support for detecting misconfigurations in the cloud has not been addressed thoroughly, likely due to the lack of frameworks suitable for consistent parsing of unstandardized configuration files. This paper introduces ConfEx, a framework that enables discovery and extraction of text-based software configurations in the cloud. ConfEx uses a novel vocabulary-based technique to identify configuration files in cloud system instances with unlabeled content. To extract the information in these files, ConfEx leverages existing configuration parsers and post-processes the extracted data for analysis. We show that ConfEx achieves over 99% precision and 100% recall in identifying configuration files on 7805 popular Docker Hub images. Using two applied examples, we demonstrate that ConfEx also enables detecting misconfigurations in the cloud via existing tools that are designed for configurations represented as key-value pairs, revealing 184 errors in public Docker Hub images.
△ Less
Submitted 31 August, 2020; v1 submitted 19 August, 2020;
originally announced August 2020.