QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture
Authors:
Shvetank Prakash,
Andrew Cheng,
Jason Yik,
Arya Tschand,
Radhika Ghosal,
Ikechukwu Uchendu,
Jessica Quaye,
Jeffrey Ma,
Shreyas Grampurohit,
Sofia Giannuzzi,
Arnav Balyan,
Fin Amin,
Aadya Pipersenia,
Yash Choudhary,
Ankita Nayak,
Amir Yazdanbakhsh,
Vijay Janapa Reddi
Abstract:
We introduce QuArch, a dataset of 1500 human-validated question-answer pairs designed to evaluate and enhance language models' understanding of computer architecture. The dataset covers areas including processor design, memory systems, and performance optimization. Our analysis highlights a significant performance gap: the best closed-source model achieves 84% accuracy, while the top small open-so…
▽ More
We introduce QuArch, a dataset of 1500 human-validated question-answer pairs designed to evaluate and enhance language models' understanding of computer architecture. The dataset covers areas including processor design, memory systems, and performance optimization. Our analysis highlights a significant performance gap: the best closed-source model achieves 84% accuracy, while the top small open-source model reaches 72%. We observe notable struggles in memory systems, interconnection networks, and benchmarking. Fine-tuning with QuArch improves small model accuracy by up to 8%, establishing a foundation for advancing AI-driven computer architecture research. The dataset and leaderboard are at https://harvard-edge.github.io/QuArch/.
△ Less
Submitted 6 January, 2025; v1 submitted 3 January, 2025;
originally announced January 2025.
MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI
Authors:
Arya Tschand,
Arun Tejusve Raghunath Rajan,
Sachin Idgunji,
Anirban Ghosh,
Jeremy Holleman,
Csaba Kiraly,
Pawan Ambalkar,
Ritika Borkar,
Ramesh Chukka,
Trevor Cockrell,
Oliver Curtis,
Grigori Fursin,
Miro Hodak,
Hiwot Kassa,
Anton Lokhmotov,
Dejan Miskovic,
Yuechao Pan,
Manu Prasad Manmathan,
Liz Raymond,
Tom St. John,
Arjun Suresh,
Rowan Taubitz,
Sean Zhan,
Scott Wasson,
David Kanter
, et al. (1 additional authors not shown)
Abstract:
Rapid adoption of machine learning (ML) technologies has led to a surge in power consumption across diverse systems, from tiny IoT devices to massive datacenter clusters. Benchmarking the energy efficiency of these systems is crucial for optimization, but presents novel challenges due to the variety of hardware platforms, workload characteristics, and system-level interactions. This paper introduc…
▽ More
Rapid adoption of machine learning (ML) technologies has led to a surge in power consumption across diverse systems, from tiny IoT devices to massive datacenter clusters. Benchmarking the energy efficiency of these systems is crucial for optimization, but presents novel challenges due to the variety of hardware platforms, workload characteristics, and system-level interactions. This paper introduces MLPerf Power, a comprehensive benchmarking methodology with capabilities to evaluate the energy efficiency of ML systems at power levels ranging from microwatts to megawatts. Developed by a consortium of industry professionals from more than 20 organizations, MLPerf Power establishes rules and best practices to ensure comparability across diverse architectures. We use representative workloads from the MLPerf benchmark suite to collect 1,841 reproducible measurements from 60 systems across the entire range of ML deployment scales. Our analysis reveals trade-offs between performance, complexity, and energy efficiency across this wide range of systems, providing actionable insights for designing optimized ML solutions from the smallest edge devices to the largest cloud infrastructures. This work emphasizes the importance of energy efficiency as a key metric in the evaluation and comparison of the ML system, laying the foundation for future research in this critical area. We discuss the implications for developing sustainable AI solutions and standardizing energy efficiency benchmarking for ML systems.
△ Less
Submitted 5 February, 2025; v1 submitted 15 October, 2024;
originally announced October 2024.