Search | arXiv e-print repository

Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation

Authors: Maria Eriksson, Erasmo Purificato, Arman Noroozian, Joao Vinagre, Guillaume Chaslot, Emilia Gomez, David Fernandez-Llorca

Abstract: Quantitative Artificial Intelligence (AI) Benchmarks have emerged as fundamental tools for evaluating the performance, capability, and safety of AI models and systems. Currently, they shape the direction of AI development and are playing an increasingly prominent role in regulatory frameworks. As their influence grows, however, so too does concerns about how and with what effects they evaluate hig… ▽ More Quantitative Artificial Intelligence (AI) Benchmarks have emerged as fundamental tools for evaluating the performance, capability, and safety of AI models and systems. Currently, they shape the direction of AI development and are playing an increasingly prominent role in regulatory frameworks. As their influence grows, however, so too does concerns about how and with what effects they evaluate highly sensitive topics such as capabilities, including high-impact capabilities, safety and systemic risks. This paper presents an interdisciplinary meta-review of about 100 studies that discuss shortcomings in quantitative benchmarking practices, published in the last 10 years. It brings together many fine-grained issues in the design and application of benchmarks (such as biases in dataset creation, inadequate documentation, data contamination, and failures to distinguish signal from noise) with broader sociotechnical issues (such as an over-focus on evaluating text-based AI models according to one-time testing logic that fails to account for how AI models are increasingly multimodal and interact with humans and other technical systems). Our review also highlights a series of systemic flaws in current benchmarking practices, such as misaligned incentives, construct validity issues, unknown unknowns, and problems with the gaming of benchmark results. Furthermore, it underscores how benchmark practices are fundamentally shaped by cultural, commercial and competitive dynamics that often prioritise state-of-the-art performance at the expense of broader societal concerns. By providing an overview of risks associated with existing benchmarking procedures, we problematise disproportionate trust placed in benchmarks and contribute to ongoing efforts to improve the accountability and relevance of quantitative AI benchmarks within the complexities of real-world scenarios. △ Less

Submitted 25 May, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

Comments: Under review as conference paper

ACM Class: I.2.0; A.1

arXiv:2412.07676 [pdf, other]

BATIS: Bootstrapping, Autonomous Testing, and Initialization System for Quantum Dot Devices

Authors: Tyler J. Kovach, Daniel Schug, M. A. Wolfe, E. R. MacQuarrie, Patrick J. Walsh, Jared Benson, Mark Friesen, M. A. Eriksson, Justyna P. Zwolak

Abstract: Semiconductor quantum dot (QD) devices have become central to advancements in spin-based quantum computing. As the complexity of QD devices grows, manual tuning becomes increasingly infeasible, necessitating robust and scalable autotuning solutions. Tuning large arrays of QD qubits depends on efficient choices of automated protocols. Here, we introduce a bootstrapping, autonomous testing, and init… ▽ More Semiconductor quantum dot (QD) devices have become central to advancements in spin-based quantum computing. As the complexity of QD devices grows, manual tuning becomes increasingly infeasible, necessitating robust and scalable autotuning solutions. Tuning large arrays of QD qubits depends on efficient choices of automated protocols. Here, we introduce a bootstrapping, autonomous testing, and initialization system (BATIS) designed to streamline QD device evaluation and calibration. BATIS navigates high-dimensional gate voltage spaces, automating essential steps such as leakage testing and gate characterization. For forming the current channels, BATIS follows a non-standard approach that requires a single measurement regardless of the number of channels. Demonstrated at 1.3 K on a quad-QD Si/Si$_x$Ge$_{1-x}$ device, BATIS eliminates the need for deep cryogenic environments during initial device diagnostics, significantly enhancing scalability and reducing setup times. By requiring only minimal prior knowledge of the device architecture, BATIS represents a platform-agnostic solution, adaptable to various QD systems, which bridges a critical gap in QD autotuning. △ Less

Submitted 19 December, 2024; v1 submitted 10 December, 2024; originally announced December 2024.

Comments: 10 pages, 3 figures

arXiv:2402.13699 [pdf, other]

doi 10.1088/2632-2153/ada087

Automation of Quantum Dot Measurement Analysis via Explainable Machine Learning

Authors: Daniel Schug, Tyler J. Kovach, M. A. Wolfe, Jared Benson, Sanghyeok Park, J. P. Dodson, J. Corrigan, M. A. Eriksson, Justyna P. Zwolak

Abstract: The rapid development of quantum dot (QD) devices for quantum computing has necessitated more efficient and automated methods for device characterization and tuning. This work demonstrates the feasibility and advantages of applying explainable machine learning techniques to the analysis of quantum dot measurements, paving the way for further advances in automated and transparent QD device tuning.… ▽ More The rapid development of quantum dot (QD) devices for quantum computing has necessitated more efficient and automated methods for device characterization and tuning. This work demonstrates the feasibility and advantages of applying explainable machine learning techniques to the analysis of quantum dot measurements, paving the way for further advances in automated and transparent QD device tuning. Many of the measurements acquired during the tuning process come in the form of images that need to be properly analyzed to guide the subsequent tuning steps. By design, features present in such images capture certain behaviors or states of the measured QD devices. When considered carefully, such features can aid the control and calibration of QD devices. An important example of such images are so-called $\textit{triangle plots}$, which visually represent current flow and reveal characteristics important for QD device calibration. While image-based classification tools, such as convolutional neural networks (CNNs), can be used to verify whether a given measurement is $\textit{good}$ and thus warrants the initiation of the next phase of tuning, they do not provide any insights into how the device should be adjusted in the case of $\textit{bad}$ images. This is because CNNs sacrifice prediction and model intelligibility for high accuracy. To ameliorate this trade-off, a recent study introduced an image vectorization approach that relies on the Gabor wavelet transform (Schug $\textit{et al.}$ 2024 $\textit{Proc. XAI4Sci: Explainable Machine Learning for Sciences Workshop (AAAI 2024) (Vancouver, Canada)}$ pp 1-6). Here we propose an alternative vectorization method that involves mathematical modeling of synthetic triangles to mimic the experimental data. Using explainable boosting machines, we show that this new method offers superior explainability of model prediction without sacrificing accuracy. △ Less

Submitted 13 January, 2025; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: 20 pages, 5 figures, abbreviated version published in Proceedings of the XAI4Sci: Explainable machine learning for sciences workshop at AAAI 2024, (Vancouver, Canada)

Journal ref: Mach. Learn.: Sci. Technol. 6, 015006 (2025)

arXiv:2312.14322 [pdf, other]

doi 10.1038/s41534-024-00878-x

Data needs and challenges for quantum dot devices automation

Authors: Justyna P. Zwolak, Jacob M. Taylor, Reed W. Andrews, Jared Benson, Garnett W. Bryant, Donovan Buterakos, Anasua Chatterjee, Sankar Das Sarma, Mark A. Eriksson, Eliška Greplová, Michael J. Gullans, Fabian Hader, Tyler J. Kovach, Pranav S. Mundada, Mick Ramsey, Torbjørn Rasmussen, Brandon Severin, Anthony Sigillito, Brennan Undseth, Brian Weber

Abstract: Gate-defined quantum dots are a promising candidate system for realizing scalable, coupled qubit systems and serving as a fundamental building block for quantum computers. However, present-day quantum dot devices suffer from imperfections that must be accounted for, which hinders the characterization, tuning, and operation process. Moreover, with an increasing number of quantum dot qubits, the rel… ▽ More Gate-defined quantum dots are a promising candidate system for realizing scalable, coupled qubit systems and serving as a fundamental building block for quantum computers. However, present-day quantum dot devices suffer from imperfections that must be accounted for, which hinders the characterization, tuning, and operation process. Moreover, with an increasing number of quantum dot qubits, the relevant parameter space grows sufficiently to make heuristic control infeasible. Thus, it is imperative that reliable and scalable autonomous tuning approaches are developed. This meeting report outlines current challenges in automating quantum dot device tuning and operation with a particular focus on datasets, benchmarking, and standardization. We also present insights and ideas put forward by the quantum dot community on how to overcome them. We aim to provide guidance and inspiration to researchers invested in automation efforts. △ Less

Submitted 5 November, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: A meeting report from a workshop held at the National Institute of Standards and Technology, Gaithersburg, MD

Journal ref: npj Quantum Inf. 10, 105 (2024)

arXiv:2108.00043 [pdf, other]

doi 10.1103/PhysRevApplied.17.024069

Toward Robust Autotuning of Noisy Quantum Dot Devices

Authors: Joshua Ziegler, Thomas McJunkin, E. S. Joseph, Sandesh S. Kalantre, Benjamin Harpt, D. E. Savage, M. G. Lagally, M. A. Eriksson, Jacob M. Taylor, Justyna P. Zwolak

Abstract: The current autotuning approaches for quantum dot (QD) devices, while showing some success, lack an assessment of data reliability. This leads to unexpected failures when noisy or otherwise low-quality data is processed by an autonomous system. In this work, we propose a framework for robust autotuning of QD devices that combines a machine learning (ML) state classifier with a data quality control… ▽ More The current autotuning approaches for quantum dot (QD) devices, while showing some success, lack an assessment of data reliability. This leads to unexpected failures when noisy or otherwise low-quality data is processed by an autonomous system. In this work, we propose a framework for robust autotuning of QD devices that combines a machine learning (ML) state classifier with a data quality control module. The data quality control module acts as a "gatekeeper" system, ensuring that only reliable data are processed by the state classifier. Lower data quality results in either device recalibration or termination. To train both ML systems, we enhance the QD simulation by incorporating synthetic noise typical of QD experiments. We confirm that the inclusion of synthetic noise in the training of the state classifier significantly improves the performance, resulting in an accuracy of 95.0(9) % when tested on experimental data. We then validate the functionality of the data quality control module by showing that the state classifier performance deteriorates with decreasing data quality, as expected. Our results establish a robust and flexible ML framework for autonomous tuning of noisy QD devices. △ Less

Submitted 8 September, 2022; v1 submitted 30 July, 2021; originally announced August 2021.

Comments: 12 pages, 6 figures

Journal ref: Phys. Rev. Applied 17, 024069 (2022)

arXiv:2102.11784 [pdf, other]

doi 10.1103/PRXQuantum.2.020335

Ray-based framework for state identification in quantum dot devices

Authors: Justyna P. Zwolak, Thomas McJunkin, Sandesh S. Kalantre, Samuel F. Neyens, E. R. MacQuarrie, Mark A. Eriksson, Jacob M. Taylor

Abstract: Quantum dots (QDs) defined with electrostatic gates are a leading platform for a scalable quantum computing implementation. However, with increasing numbers of qubits, the complexity of the control parameter space also grows. Traditional measurement techniques, relying on complete or near-complete exploration via two-parameter scans (images) of the device response, quickly become impractical with… ▽ More Quantum dots (QDs) defined with electrostatic gates are a leading platform for a scalable quantum computing implementation. However, with increasing numbers of qubits, the complexity of the control parameter space also grows. Traditional measurement techniques, relying on complete or near-complete exploration via two-parameter scans (images) of the device response, quickly become impractical with increasing numbers of gates. Here we propose to circumvent this challenge by introducing a measurement technique relying on one-dimensional projections of the device response in the multidimensional parameter space. Dubbed the ``ray-based classification (RBC) framework,'' we use this machine learning approach to implement a classifier for QD states, enabling automated recognition of qubit-relevant parameter regimes. We show that RBC surpasses the 82 % accuracy benchmark from the experimental implementation of image-based classification techniques from prior work while reducing the number of measurement points needed by up to 70 %. The reduction in measurement cost is a significant gain for time-intensive QD measurements and is a step forward toward the scalability of these devices. We also discuss how the RBC-based optimizer, which tunes the device to a multiqubit regime, performs when tuning in the two-dimensional and three-dimensional parameter spaces defined by plunger and barrier gates that control the QDs.This work provides experimental validation of both efficient state identification and optimization with machine learning techniques for non-traditional measurements in quantum systems with high-dimensional parameter spaces and time-intensive measurements. △ Less

Submitted 17 June, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

Comments: 9 pages, 4 figures

Journal ref: PRX Quantum 2, 020335 (2021)

Showing 1–6 of 6 results for author: Eriksson, M