Isambard-AI: a leadership class supercomputer optimised specifically for Artificial Intelligence

McIntosh-Smith, Simon; Alam, Sadaf R; Woods, Christopher

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2410.11199 (cs)

[Submitted on 15 Oct 2024 (v1), last revised 4 Nov 2024 (this version, v2)]

Title:Isambard-AI: a leadership class supercomputer optimised specifically for Artificial Intelligence

Authors:Simon McIntosh-Smith, Sadaf R Alam, Christopher Woods

View PDF

Abstract:Isambard-AI is a new, leadership-class supercomputer, designed to support AI-related research. Based on the HPE Cray EX4000 system, and housed in a new, energy efficient Modular Data Centre in Bristol, UK, Isambard-AI employs 5,448 NVIDIA Grace-Hopper GPUs to deliver over 21 ExaFLOP/s of 8-bit floating point performance for LLM training, and over 250 PetaFLOP/s of 64-bit performance, for under 5MW. Isambard-AI integrates two, all-flash storage systems: a 20 PiByte Cray ClusterStor and a 3.5 PiByte VAST solution. Combined these give Isambard-AI flexibility for training, inference and secure data accesses and sharing. But it is the software stack where Isambard-AI will be most different from traditional HPC systems. Isambard-AI is designed to support users who may have been using GPUs in the cloud, and so access will more typically be via Jupyter notebooks, MLOps, or other web-based, interactive interfaces, rather than the approach used on traditional supercomputers of sshing into a system before submitting jobs to a batch scheduler. Its stack is designed to be quickly and regularly upgraded to keep pace with the rapid evolution of AI software, with full support for containers. Phase 1 of Isambard-AI is due online in May/June 2024, with the full system expected in production by the end of the year.

Comments:	11 pages, 11 figures
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2410.11199 [cs.DC]
	(or arXiv:2410.11199v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2410.11199

Submission history

From: Christopher Woods [view email]
[v1] Tue, 15 Oct 2024 02:34:26 UTC (1,511 KB)
[v2] Mon, 4 Nov 2024 12:47:31 UTC (1,817 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Isambard-AI: a leadership class supercomputer optimised specifically for Artificial Intelligence

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Isambard-AI: a leadership class supercomputer optimised specifically for Artificial Intelligence

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators