-
Good things come in small packages: Should we build AI clusters with Lite-GPUs?
Authors:
Burcu Canakci,
Junyi Liu,
Xingbo Wu,
Nathanaël Cheriere,
Paolo Costa,
Sergey Legtchenko,
Dushyanth Narayanan,
Ant Rowstron
Abstract:
To match the blooming demand of generative AI workloads, GPU designers have so far been trying to pack more and more compute and memory into single complex and expensive packages. However, there is growing uncertainty about the scalability of individual GPUs and thus AI clusters, as state-of-the-art GPUs are already displaying packaging, yield, and cooling limitations. We propose to rethink the de…
▽ More
To match the blooming demand of generative AI workloads, GPU designers have so far been trying to pack more and more compute and memory into single complex and expensive packages. However, there is growing uncertainty about the scalability of individual GPUs and thus AI clusters, as state-of-the-art GPUs are already displaying packaging, yield, and cooling limitations. We propose to rethink the design and scaling of AI clusters through efficiently-connected large clusters of Lite-GPUs, GPUs with single, small dies and a fraction of the capabilities of larger GPUs. We think recent advances in co-packaged optics can enable distributing AI workloads onto many Lite-GPUs through high bandwidth and efficient communication. In this paper, we present the key benefits of Lite-GPUs on manufacturing cost, blast radius, yield, and power efficiency; and discuss systems opportunities and challenges around resource, workload, memory, and network management.
△ Less
Submitted 29 April, 2025; v1 submitted 17 January, 2025;
originally announced January 2025.
-
Managed-Retention Memory: A New Class of Memory for the AI Era
Authors:
Sergey Legtchenko,
Ioan Stefanovici,
Richard Black,
Antony Rowstron,
Junyi Liu,
Paolo Costa,
Burcu Canakci,
Dushyanth Narayanan,
Xingbo Wu
Abstract:
AI clusters today are one of the major uses of High Bandwidth Memory (HBM). However, HBM is suboptimal for AI workloads for several reasons. Analysis shows HBM is overprovisioned on write performance, but underprovisioned on density and read bandwidth, and also has significant energy per bit overheads. It is also expensive, with lower yield than DRAM due to manufacturing complexity. We propose a n…
▽ More
AI clusters today are one of the major uses of High Bandwidth Memory (HBM). However, HBM is suboptimal for AI workloads for several reasons. Analysis shows HBM is overprovisioned on write performance, but underprovisioned on density and read bandwidth, and also has significant energy per bit overheads. It is also expensive, with lower yield than DRAM due to manufacturing complexity. We propose a new memory class: Managed-Retention Memory (MRM), which is more optimized to store key data structures for AI inference workloads. We believe that MRM may finally provide a path to viability for technologies that were originally proposed to support Storage Class Memory (SCM). These technologies traditionally offered long-term persistence (10+ years) but provided poor IO performance and/or endurance. MRM makes different trade-offs, and by understanding the workload IO patterns, MRM foregoes long-term data retention and write performance for better potential performance on the metrics important for these workloads.
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
Roadmap on Neuromorphic Photonics
Authors:
Daniel Brunner,
Bhavin J. Shastri,
Mohammed A. Al Qadasi,
H. Ballani,
Sylvain Barbay,
Stefano Biasi,
Peter Bienstman,
Simon Bilodeau,
Wim Bogaerts,
Fabian Böhm,
G. Brennan,
Sonia Buckley,
Xinlun Cai,
Marcello Calvanese Strinati,
B. Canakci,
Benoit Charbonnier,
Mario Chemnitz,
Yitong Chen,
Stanley Cheung,
Jeff Chiles,
Suyeon Choi,
Demetrios N. Christodoulides,
Lukas Chrostowski,
J. Chu,
J. H. Clegg
, et al. (125 additional authors not shown)
Abstract:
This roadmap consolidates recent advances while exploring emerging applications, reflecting the remarkable diversity of hardware platforms, neuromorphic concepts, and implementation philosophies reported in the field. It emphasizes the critical role of cross-disciplinary collaboration in this rapidly evolving field.
This roadmap consolidates recent advances while exploring emerging applications, reflecting the remarkable diversity of hardware platforms, neuromorphic concepts, and implementation philosophies reported in the field. It emphasizes the critical role of cross-disciplinary collaboration in this rapidly evolving field.
△ Less
Submitted 16 January, 2025; v1 submitted 14 January, 2025;
originally announced January 2025.
-
Analog Iterative Machine (AIM): using light to solve quadratic optimization problems with mixed variables
Authors:
Kirill P. Kalinin,
George Mourgias-Alexandris,
Hitesh Ballani,
Natalia G. Berloff,
James H. Clegg,
Daniel Cletheroe,
Christos Gkantsidis,
Istvan Haller,
Vassily Lyutsarev,
Francesca Parmigiani,
Lucinda Pickup,
Antony Rowstron
Abstract:
Solving optimization problems is challenging for existing digital computers and even for future quantum hardware. The practical importance of diverse problems, from healthcare to financial optimization, has driven the emergence of specialised hardware over the past decade. However, their support for problems with only binary variables severely restricts the scope of practical problems that can be…
▽ More
Solving optimization problems is challenging for existing digital computers and even for future quantum hardware. The practical importance of diverse problems, from healthcare to financial optimization, has driven the emergence of specialised hardware over the past decade. However, their support for problems with only binary variables severely restricts the scope of practical problems that can be efficiently embedded. We build analog iterative machine (AIM), the first instance of an opto-electronic solver that natively implements a wider class of quadratic unconstrained mixed optimization (QUMO) problems and supports all-to-all connectivity of both continuous and binary variables.Beyond synthetic 7-bit problems at small-scale, AIM solves the financial transaction settlement problem entirely in analog domain with higher accuracy than quantum hardware and at room temperature. With compute-in-memory operation and spatial-division multiplexed representation of variables, the design of AIM paves the path to chip-scale architecture with 100 times speed-up per unit-power over the latest GPUs for solving problems with 10,000 variables. The robustness of the AIM algorithm at such scale is further demonstrated by comparing it with commercial production solvers across multiple benchmarks, where for several problems we report new best solutions. By combining the superior QUMO abstraction, sophisticated gradient descent methods inspired by machine learning, and commodity hardware, AIM introduces a novel platform with a step change in expressiveness, performance, and scalability, for optimization in the post-Moores law era.
△ Less
Submitted 20 June, 2023; v1 submitted 25 April, 2023;
originally announced April 2023.